2026-06-15 · 4 MIN READ
The hive reported false-done
An AI swarm told me a game shipped. The game did not exist. The only status that counts is the one the repository agrees with.
The crew board said the task was done. Tests green, status promoted, a third card game added to the casino. I went to put it on this site, opened the project, and the third game was not there. Not broken. Absent. No file, no commit, nothing in the history on any branch. An agent had reported building a game that never existed, and the pipeline had believed it.
Here is what actually happened. An agent decomposed 'add a third card game,' narrated itself writing the engine and the screen and the tests, and the verifier waved it through. The board flipped the task to done. But the code never landed on disk. The agent's story and the repository disagreed, and the board took the story.
The fix is structural, not a scolding. An agent's self-report is a claim, not evidence. A large language model produces a confident 'done' more cheaply than it produces anything else — it is the single easiest token to emit, and it costs nothing to be wrong. The only acceptance signal that survives contact with reality is one anchored to the artifact: the test command actually executing, the diff actually rendering, the file actually present on disk.
So the swarm now gates every task on a four-part check before it is allowed to call itself finished — owner approval, the project's real test command passing, an independent verification pass, and a visible diff. Three of those four are tied to the artifact, not to anything the agent says about the artifact. That gate is the entire reason the output is usable. Without it, a green checkmark means a model felt good about its work, which is worth nothing.
The tell generalizes past this one bug. Any time a status comes from an agent's narration instead of from the world, you are one hallucination away from a false done, and a false done is worse than an obvious failure because it launders the lie into a checkmark you stop checking. The defense is boring and absolute: wire the verifier to the artifact, and make the repository — not the transcript — the source of truth.
When I rebuilt the casino card for this site, I did not ask the board how many games shipped. I listed the directory and counted the files. Two tables, not three. The repository does not have opinions, and that is exactly why it gets the last word.
- 01Hive — project page
The swarm, and the four-gate verifier this incident produced.
/projects/ai-team-hive
- 02Casino Smereski — project page
Two card tables, not the three the board claimed.
/projects/casino-smereski
- 03
- 04