Autonomous developer agents: how far should you let them code?
Our agents analyse repositories, fix bugs, write tests and open pull requests. The one rule that makes it safe: they never touch production.
The most polarising agents we run are the ones that write code. They analyse a fleet of repositories, prioritise what needs fixing, patch it, test it — and open a pull request like any developer would.
The boundary that makes it work
The agents’ world ends at the pull request. Merging is a human act. Deploying is a human act. This is not caution theatre: it means the blast radius of a bad agent decision is a rejected PR, not an outage. Reviewers stay in the loop exactly where their judgment matters.
What they are genuinely good at
- →The backlog nobody picks: dependency bumps, flaky tests, dead code, missing coverage.
- →Repetitive migrations applied consistently across dozens of repositories.
- →First drafts of fixes where the pattern is known and the risk is low.
Multiply that across a fleet of repositories and the arithmetic gets serious — it is a meaningful share of the full-time-equivalent workforce we measure across our deployed agents.
Where humans stay irreplaceable
Architecture, trade-offs, naming, product sense — anything where the right answer depends on where the codebase should go, not where it is. Our rule of thumb: agents for the work that has a pattern, humans for the work that sets the pattern.