Autonomous developer agents: how far should you let them code?

Our agents analyse repositories, fix bugs, write tests and open pull requests. The one rule that makes it safe: they never touch production.

The most polarising agents we run are the ones that write code. They analyse a fleet of repositories, prioritise what needs fixing, patch it, test it — and open a pull request like any developer would.

The boundary that makes it work

The agents’ world ends at the pull request. Merging is a human act. Deploying is a human act. This is not caution theatre: it means the blast radius of a bad agent decision is a rejected PR, not an outage. Reviewers stay in the loop exactly where their judgment matters.

What they are genuinely good at

→The backlog nobody picks: dependency bumps, flaky tests, dead code, missing coverage.
→Repetitive migrations applied consistently across dozens of repositories.
→First drafts of fixes where the pattern is known and the risk is low.

Multiply that across a fleet of repositories and the arithmetic gets serious — it is a meaningful share of the full-time-equivalent workforce we measure across our deployed agents.

Where humans stay irreplaceable

Architecture, trade-offs, naming, product sense — anything where the right answer depends on where the codebase should go, not where it is. Our rule of thumb: agents for the work that has a pattern, humans for the work that sets the pattern.