What We Lost When Contributions Got Cheap
What the marker handshake in KitOps revealed about where AI contribution friction really lives.
A few months ago we added a rule to the KitOps repository. Any file a coding agent modifies has to end with a marker comment. For Go files it looks like this:
// AGENT_MODIFIED: Human review required before mergeOther file types get the same line wrapped in the appropriate comment syntax. The rule is documented in AGENTS.md at the root of the repo. Coding agents that respect AGENTS.md read it before touching anything and follow the rule. CI rejects any pull request that still contains the marker. To merge, a human has to open every modified file and delete that line.
The goal was simple. Force a human to look at every file a coding agent had modified, even if only to remove a comment.
The Less Obvious Half
The marker on contribution is the part people see first. The more interesting part is what AGENTS.md tells a reviewing agent to do.
The pre-review instruction is short. Before reviewing a pull request, search for AGENT_MODIFIED. If you find one, refuse to review. Stop and tell the human to remove markers and request review again.
This breaks a specific failure mode. Without that check, you can run an entire PR through a coding agent on the contributor side, run the review through a coding agent on the maintainer side, and merge. Both agents look productive. Neither human opened a file. The pre-review instruction turns that loop into an error. Either a human removed the markers, or no review happens.
We call this the marker handshake. Contributors mark, reviewers halt on markers, humans remove markers to clear the path. The handshake fails if either side is missing.
What the Handshake Does Well
CI enforcement matters. The handshake is not asking contributors to be honest. It is making non-removal automatically painful. If a marker remains, the PR fails. The cost of leaving markers is higher than the cost of removing them.
When a coding agent generates markers, the weakest version of the goal is guaranteed. Somebody opened the file and deleted the marker before CI would let the PR through. When no markers are generated, the mechanism is silent and we have no signal either way.
Combined with the pre-review check, the AI-to-AI loop cannot close by accident. To merge agent-generated, agent-reviewed code, a human has to step in at one point or another.
Whether this constitutes a major improvement depends on how often that loop would otherwise have fired, which we cannot measure. What we can say is that it is no longer possible without somebody choosing to subvert the mechanism.
What the Handshake Cannot Do
The mechanism is doing what we asked it to. Most pull requests arrive with no markers in them. We cannot tell from the outside whether the contributor was not using an agent, or was using one and removed the markers cleanly. We see the same outcome either way: a PR that passes the CI check.
What we cannot tell, when markers do get removed, is whether the review that happened during removal was meaningful.
Opening a file to delete a line is not the same as understanding what the code does. A contributor can delete the marker, scroll up briefly, decide the function looks reasonable, and commit. The marker came off. The CI passed. The reviewer agent ran. The PR merged.
This is not a problem the markers were designed to solve. It is the problem the markers revealed.
What We Actually Found
Our intake of contributions did not slow down. It also did not speed up to keep pace with AI. The contributions, even after human attention, still did not understand the project.
Every project has constraints that are not enforced by its tests. In KitOps, those constraints span the architecture, the patterns the codebase has settled on, and the project’s goals, including what it is deliberately not trying to be. None of this lives in any single function. A coding agent reading the file it is about to modify has access to one slice of the codebase, no model of the project’s direction, and no way to tell which patterns are load-bearing versus incidental. The pull request it produces will compile, pass tests, and read cleanly. It will also, often enough to matter, get the constraint wrong.
The marker on the file does not help. The human attention triggered by removing the marker does not help. The reviewer agent does not catch it because the agent does not have a model of the project either.
We do try to capture these in AGENTS.md. Some of it captures. Most of it cannot, because the constraints that matter most live in patterns people learned by breaking something, or in decisions whose original reasoning has faded. And even the parts that capture only sharpen the next agent. They do not build contributors. They do not build future maintainers.
The bottleneck is not review effort. The bottleneck is contributor familiarity with the codebase.
What Cheap Contributions Broke
Code review in open source has always served two purposes that we did not distinguish carefully enough until AI made them come apart.
The first purpose is verification. Does this PR do what it claims to do, without breaking existing behavior, without introducing security holes, without violating project conventions.
The second purpose is calibration. The act of writing a PR, defending it in review, and incorporating reviewer feedback teaches the contributor what the project values. A contributor whose first three PRs come back with comments about the project’s patterns eventually internalizes them. Their fourth PR is better. The reviewer’s comments do not just fix the PR. They fix the contributor.
These two purposes were tightly coupled when the cost of writing a PR was high. A contributor would not write 200 lines of code unless they cared enough about the project to absorb feedback. The expensive part of contribution was already a filter for the kind of person who would learn.
AI decoupled them. A contributor can now produce 200 lines of code at near-zero cost without any of the learning that used to come with producing 200 lines of code. The verification purpose of review still works, more or less. The calibration purpose has no recipient. Reviewer comments fix this PR, then the next PR shows up and the contributor has often not internalized anything.
The marker handshake addresses verification. It cannot address calibration because calibration was never a property of the review step. It was a property of the contribution step.
Where That Leaves Us
The honest summary is that we have a mechanism that does what it was designed to do, applied to a problem that turns out to be slightly different from the one we set out to solve. Forced human review prevents the worst case. It does not, and cannot, produce contributors who understand the project.
We are still running the handshake. The downside is small and the upside is real for the specific failure mode it addresses. We are also looking at what else could help. Better onboarding documentation might shift some of the calibration work back to a place where AI cannot remove it. Tests that encode architectural invariants would let the verification step catch more of what currently slips through. Neither is a complete answer.
If you are running into the same problem on your project, I would be curious what you have tried. The marker handshake is one thread to pull. There are others. None of them, as far as I can tell, restore the coupling that contribution cost used to enforce for free.
There is a harder question underneath this, which I am still working through. Whether what is arriving in our PR queues counts as contribution in the older sense, or whether the contributions have quietly turned to tokens. The two look identical at the API boundary (PR). They are not the same thing arriving.
