The single most reliable agentic pattern in production has nothing to do with a smarter model. It is a separation of duties: one sub-agent produces the work, and a second, independent sub-agent verifies it — ideally adversarially, prompted not to bless the output but to break it. This maker-checker split is the antidote to the failure mode that quietly kills most agent deployments: the "almost right" output that sails through because the agent grading it is the same agent that wrote it, and an agent reviewing its own work is biased toward approving it. Connect a model directly to a real-world action — publishing, deploying, paying — with no independent checker in between, and you are one hallucination away from a PR crisis or an outage.
Why Self-Review Fails
Ask an agent to write something and then ask the same agent "is this correct?" and you will almost always get "yes." This is not a bug you can prompt your way out of — it is a structural property of how these models behave. An agent reviewing its own output is operating with a sycophancy bias toward itself: the work is already in its context as the thing it just decided to produce, and the path of least resistance is to confirm the decision it already made. The model that just argued for an approach is the worst possible judge of whether that approach was right.
This is the same sycophancy dynamic that distorts decisions all the way up to the executive suite, and it is why "almost right" is the most dangerous output category in software. We covered the trust consequences in depth in our analysis of the "almost right" problem and the AI code trust crisis: output that is 90% correct is more dangerous than output that is obviously wrong, because the wrongness is buried and an under-motivated reviewer will not dig for it. A self-reviewing agent is the most under-motivated reviewer imaginable. It has every incentive to confirm and none to refute.
"An agent grading its own work is not a check. It is a confirmation ritual. The only review that catches errors is one performed by something that did not produce the work and is rewarded for finding problems in it."
The Maker-Checker Pattern
Maker-checker is borrowed, deliberately, from a domain that has taken verification seriously for centuries: finance. In banking, the person who initiates a transaction cannot be the person who approves it. The separation is not about distrust of any individual — it is a structural acknowledgment that a single actor checking their own work is a single point of failure. Applied to agents, the maker produces, the checker verifies, and the two are separate processes with separate context and separate incentives.
Three properties make the checker actually work, and skipping any one of them collapses it back into theater.
Independent Context
The checker must not see the maker's reasoning. If you hand the checker the maker's full chain of thought — "here's why I did it this way" — you have anchored the checker on the maker's assumptions, and it will tend to accept the same flawed premises. The checker should see only the requirements and the finished output, evaluated cold. This is what makes it a genuinely independent observer rather than a downstream rationalizer of the maker's choices.
Adversarial Prompting
The framing of the checker's instruction is the difference between a check that works and one that doesn't. "Review this and confirm it's correct" invites a rubber stamp. "Find every way this output fails the requirements; assume there is at least one defect and locate it" inverts the incentive. The model is just as capable of producing a sharp critique as a warm endorsement — but only if you ask for the critique. The sycophancy is a default, not a hard limit, and adversarial framing is how you break it. The same trick applies whether the maker wrote prose, code, or a payment instruction.
Majority Vote
For the highest-stakes outputs, a single checker is still a single point of failure. Running a panel of independent checkers and requiring agreement — or surfacing any disagreement for human review — catches the errors that any one checker would miss. The statistical logic is the same as ensemble methods: independent errors do not correlate, so the probability that three independent checkers all miss the same defect is far lower than the probability that one does.
Why it fails
- • Same agent, same context, same blind spots
- • Sycophantic toward its own decision
- • Incentive to confirm, not to refute
- • "Almost right" sails straight through
- • A confirmation ritual, not a check
Why it works
- • Independent agent, fresh context
- • Prompted to break, not bless
- • Opposing incentives by design
- • Panel vote for high-stakes output
- • Catches the buried 10% error
The Unguarded Agent: A Cautionary Tale
The cost of skipping the checker is not hypothetical. Consider the founder who ran an unguarded "AI growth agent" on X — a bot wired to auto-reply and auto-follow at machine speed, with no checker between the model and the action. It did exactly what it was told, which was the problem: it generated activity at a volume no human could, roughly a thousand replies a day, and the platform's abuse systems did what they are built to do. The founder lost three accounts in two months. The agent was not malfunctioning. It was performing flawlessly at a task that should never have been connected directly to an irreversible action without verification.
The lesson generalizes well beyond social media. Automation that does what no human could do — at a scale and speed no human could match — gets nuked precisely because it does what no human could do. A checker, even a crude one, would have flagged "this volume is outside any plausible human range" before the action fired. The absence of that one gate turned a clever growth hack into three dead accounts.
The same shape of failure recurs across content automation, code deployment, and financial actions. The common thread is always a model connected directly to an action with no independent verification in between. And it compounds with the security surface: an agent with no checker is also an agent with no defense against manipulated inputs, a risk we examined in detail in our piece on how prompt injection remains unsolved. A checker that refutes is also a checker that can catch an output that was steered by a malicious instruction the maker swallowed.
Where to Put Human Gates vs. Automated Checks
Not every check needs a human, and not every action can be trusted to an automated checker. The criterion that cuts cleanly through the decision is reversibility. If an action can be undone cheaply, an automated checker is enough — when it occasionally fails, you roll back and lose little. If an action is irreversible or expensive to reverse, a human gate belongs in the loop, because the cost of a missed defect is now catastrophic rather than annoying.
Notice that the automated checker does not disappear when a human gate is added — it runs first and narrows what the human has to look at. A well-designed loop has the maker produce, the checker refute, and only the outputs that survive the checker get escalated to the human, already annotated with the checker's findings. The human is not re-reviewing everything from scratch; they are adjudicating a pre-filtered, pre-critiqued shortlist. That is what keeps the human gate from becoming a bottleneck that defeats the point of automation.
"Reversibility tells you who reviews. If undoing it is cheap, let an adversarial agent check it. If undoing it is a crisis, a human signs off on the checker's verdict before anything fires."
A Verification-Loop Blueprint
Putting it together, here is the loop that keeps agents safely in production. It is deliberately simple, because the reliability comes from the separation of duties, not from cleverness.
The maker-and-check loop also pairs naturally with the self-improving systems we wrote about in our piece on compounding AI workflows: every checker verdict is a graded signal you can feed back into the maker's playbook, so the maker produces fewer defects over time and the checker has less to catch. Verification is not just a safety gate — it is a source of the training signal that makes the whole loop get better the longer it runs.
How Teams Get Maker-Checker Wrong
The pattern is simple, which is exactly why it gets implemented badly. The shape looks right — there's a checker, a gate, a loop — but one of the load-bearing properties is quietly missing, and the verification degrades into theater that produces a false sense of safety worse than no checker at all. Three mistakes account for almost all of it.
The first is sharing context between maker and checker. It is tempting, for efficiency, to hand the checker the maker's full reasoning so it "understands" the output. This destroys the independence that made the check worth running. Once the checker has absorbed the maker's justifications, it inherits the maker's blind spots and tends to ratify the same flawed premises. The checker must see the requirements and the output — not the story the maker told itself about why the output is good.
The second is a polite checker. If the checker's prompt asks it to "review and provide feedback," it will default to the same agreeable register that makes self-review useless. A checker that opens with "this looks solid overall" has already failed. The instruction has to be adversarial and presumptive: assume a defect exists, find it, and state precisely how the output violates the requirement. A checker that cannot find anything wrong should have to argue why — the burden of proof runs toward suspicion, not approval.
The third mistake is mis-sizing the gate. Two opposite errors live here. One team wires an irreversible action — a payment, a production deploy — to fire automatically on an automated checker's pass, and discovers the cost of a missed defect the hard way. The opposite team routes every trivial, reversible output through a human, drowns the reviewer in low-stakes approvals, and the human rubber-stamps everything within a week because the volume is untenable. Both errors come from ignoring reversibility as the sizing criterion. The gate should be exactly as heavy as the cost of being wrong, and no heavier.
There is a fourth, quieter failure worth naming: treating the checker's pass as a guarantee rather than a filter. A maker-checker loop dramatically reduces the rate of defects reaching the action; it does not drive that rate to zero. The right mental model is defense in depth — the checker catches most of what the maker gets wrong, the human gate catches most of what the checker misses on irreversible actions, and monitoring catches the rest after the fact. Anyone who tells you a single checker makes an agent "safe" is overselling. It makes the agent safer, which, sized correctly against reversibility, is exactly enough to ship.
Conclusion: The Pattern That Earns the Trust
The teams running agents successfully in production are not the ones with the most capable models. They are the ones who refused to let a model grade its own homework. Maker-checker — independent context, adversarial prompting, optional panel vote, and a human gate sized to the irreversibility of the action — is the unglamorous pattern doing the load-bearing work behind every agent deployment that hasn't blown up. It directly answers the "almost right" failure mode by introducing something motivated to find the buried error, and it answers the unguarded-action failure mode by refusing to wire a model straight to a consequence.
The founder who lost three accounts did not lack a good model. He lacked a checker. The fix was never a smarter agent — it was a second, skeptical one, plus a human signature on anything that couldn't be undone. That is the whole pattern, and it is the difference between an agent you can put in production and one that puts you in an incident review.
Tags
Share
Building something like this? See how we ship it or start a project.