Agent Swarms Are Not 10X
Agent swarms are sexy. Multiple AI agents working in parallel, dividing up tasks, building features simultaneously. It sounds incredible. And for certain use cases, it is.
But for most people, most of the time, it's not worth the switch.
I tested agent swarms so you don't have to
I tried out overstory and it definitely tore through the specs I gave it. One command, with a well designed spec, and it took it to completion. Cool.
But in the end, I dropped the swarm.
It wasn't a bad experience. It just wasn't a big enough improvement to justify changing how I work.
And honestly, I had to be even more paranoid about the outcome because the amount of code flying past me was absurd. In order to ensure I got the outcome, I had to spend more time on the inputs and then cross my fingers that I didn't forget anything important. The end result worked, sure, but how do I know the implementation is "correct" and not just "working".
What Swarms Are Good At
Swarms are great at producing a lot of code. If you're rebuilding an old piece of software from scratch, or doing a clean room copy of an open source project from spec, a swarm can tear through that work. The tasks are well-defined, independent, and the agents can run in parallel to create huge amounts of code. And they can do so without stepping on each other, or at least while reconciling their differences. This has been shown to be possible with Gas Town and Overstory.
The Honest Assessment
Grabbing a tool like Overstory off the shelf and dropping it into a project is easy. The setup cost is low. But Claude can build these features too. A single agent, used well, can do the same work. And it's easier to audit what was completed with a single agent. The swarm didn't produce a meaningfully better outcome for most of the tasks I threw at it. It produced the same output.
Know When to Reach for Swarms
The issue isn't whether swarms work. They absolutely do. The issue is whether they work better enough to justify changing your process. And I wrote a separate post about this. My criteria for changing process is not "somewhat better." It's not 2X. It's 10X.
A swarm for a greenfield rewrite? Maybe that clears the bar.
A swarm for day-to-day feature work on an existing codebase? Probably not. You're adding coordination overhead, debugging complexity, and merge conflicts for a marginal speed gain. The lack of oversight from the human most times is going to kill the utility of a swarm.
It's becoming so important to know when to reach for them. Most of the time, the answer is: don't.
This post is brought to you by myself and Claude Opus 4.6.