Every Claude Code Is a Reactor

We pulled the control rods out and called it productivity

May 22, 2026

*me going to work about to review 10 KLOCs of machine generated code*

Two things happened to me last week.

I shipped a project I previously would have estimated at six weeks. It took five days. Most of it was driven by Claude Code, the output was good and in places it was better than what I would have written by hand at the speed I would have written it. The reactor heated the grid beautifully.

I also ran a bunch of frontend interviews. Still processing how bad it was.

On purpose, we let frontend candidates use LLMs in our interviews. The whole loop is designed around how you would actually work on the job in 2026, which means with an AI in the room. We hand you a requirements document and we let you prompt whatever model you want. The exercise is: read the requirements, drive the model, judge what it gives you back, ship something that addresses what was asked, and be able to explain it.

They could not read the requirements document.

I do not mean they read it badly. When I asked direct questions about what the document said, with the document still on the screen in front of them, they could not find the answers, or they answered something adjacent, or they restated their prompt back at me.

A few specifics, because the abstract version of this is too easy to dismiss. One candidate spent five minutes debugging a sorting issue. There was no mention of sorting key in the API docs. Another candidate decided unprompted that they needed mock data instead of using the test API I had set up; the LLM, trying to be helpful, ended up building a layer that combined and deduplicated rows from both, and the candidate could not tell that this was happening until I sat down and walked them through it. A third candidate had a working UI by the end of the session: buttons, list, the map and could not tell me where in their code the pagination was actually happening.

When the model produced something subtly wrong, none of them could see it. They could not push back on the model, could not say “you gave me Y, the requirements asked for X.” In those sessions, the LLM was the one doing the engineering work. The candidate was a transport layer between document and model, and the transport leaked in both directions.

Those two skills, reading carefully and judging output, are now the entire job.

These candidates were three to five years into their careers and had never been taught what the tool is for, because the apprenticeship that would have taught them got quietly dismantled while they were learning to prompt.

Last month was the 40th anniversary of Chernobyl. My hometown of Omsk, a Siberian city about as far from Pripyat as Lisbon is from Moscow, sent ~1500 people to liquidate the consequences of the disaster. Welders, drivers, soldiers, engineers. They flew west to clean up the disaster, in hastily issued uniforms, with improvised gear, often without being told what the dose meters they carried actually meant. About 600,000 liquidators in total were mobilized from across the USSR. Many died. Many more lived with what came after.

A reactor is the right metaphor here, mechanically, not just rhetorically. A nuclear reactor produces a great deal of energy in a small space, and the operator can be wrong about what state the system is in. LLM-assisted software has both properties now. Power density per engineer went up. Instrumentation, training, and containment did not.

We are running reactors and most teams are operating them like toasters. The people who would replace us in ten years are not being taught the difference.

The old training pipeline had control rods in it

Software engineering used to train people through containment.

You started with a small bug. Then a self-contained feature. Then a service. Then an integration. Then maybe a subsystem. After years of this, you got trusted with architecture, schema migrations, auth flows, data models, production incidents, decisions whose consequences would be felt by other teams. Code review covered correctness, sure, but it was also how taste got transmitted. Production access, schema changes, infra changes, and auth changes were gated, and gated for a reason. “Full-stack” used to mean breadth you had earned, and “the AI generated React, SQL, Terraform, and a Dockerfile in one prompt” did not yet exist as a sentence.

It was all slow, annoying, bureaucratic. Looking back, it was a safety system.

A reactor with the control rods inserted by default, where junior engineers learned one layer at a time, scope expanded as judgment did, and blast radius stayed contained until the operator could read the dials.

I came up through that pipeline

I have been writing software for 15 years. I built data pipelines and tooling for measuring search ranker quality. I built high-throughput systems. I worked on the control plane for cloud data warehouses, and followed incident runbooks that said, in detail, how to spin up a pinned version of a data warehouse with dozens of compute nodes, because production was in a particular bad state and recovery had to happen in a particular correct order. Miss any specific detail and the recovery itself becomes the next incident.

That kind of work changes how you see technology - the magic resolves into layers: APIs, storage engines, schedulers, queues, permissions, deployments, logs, rollback paths, runbooks, humans under pressure, the thousand small assumptions that let a system survive contact with reality.

So when I sat across the table last week and watched candidates treat the requirements doc as another file to feed the LLM, what I felt was recognition. They had been handed the LLM on day one of their careers, and nobody had taught them the layers underneath.

It is not their fault. The drill is not being run.

The old pipeline worked because, for all its annoyance, it made juniors do reps. Stuck on syntax for an afternoon. Code review that picked apart their abstractions. A senior who refused to merge until the author could explain the diff line by line. A document handed to them that they had to actually read, not skim, not summarize, before they were allowed to write anything. Production access denied for a year. The drill was annoying, and that was the point. It turned a person who could type code into a person who could be trusted with a system.

Most of the drill was about reading: specs, other people’s code, your own code with a critical eye, logs at 3 AM and figuring out what the machine was telling you. Output came later.

Fewer places run that drill now. Many run a thinned-out version, or say they run it while quietly rewarding the artifact over the understanding. Most companies cannot justify the slowness when an LLM produces the artifact in an hour. Engineers coming up now were rarely allowed to get stuck, and getting stuck was where the learning used to happen.

The candidates I interviewed were capable, but the drill that would have made their capability count had never been run on them. They were handed reactor controls before anyone taught them what normal operation looks like. Hopefully some will figure it out despite the system, but many will not. And the industry will keep confusing artifact production with engineering, hiring against a thinning bench, until the postmortems become impossible to ignore.

LLMs revealed a problem that already existed. They gave infinite leverage to people whose training pipeline no longer reliably teaches judgment. And they gave the rest of us, the people who came up through the old pipeline and have the scars, a power tool that lets us pretend the bench underneath us is deeper than it actually is.

The candidates I interviewed could prompt but could not interact meaningfully with what they were reading.

Day-zero generalists

A new engineer can sit down on day one and ask an LLM to build the whole slice: database table, API endpoint, frontend component, background job, auth middleware, tests, deployment config. The output looks coherent. It compiles, has comments, uses the right nouns. It feels like engineering.

This is the asymmetry that made my last week strange. When I drove that five-day project with Claude Code, I could see where the dangerous parts were: which generated migrations to read line by line and which to skim, where the auth boundary belonged and what would happen if the LLM put it in the wrong place, which tests were tests and which were tautologies that mirrored the implementation. I could feel when the model was confidently wrong, because I had spent fifteen years learning what confidently-wrong looks like in systems I had built without help. The LLM compressed six weeks of typing into one. The fifteen years of scars that told me where to look did not compress. They could not.

The candidates I interviewed had the LLM but not the fifteen years, and nobody was going to give them those years.

Radiation has another lesson. Physicists who build sensitive detectors ran into a problem. All steel produced after 1945 is slightly contaminated by atmospheric fallout from nuclear testing. So they salvaged steel from sunken pre-war battleships. It is called Low-background steel. Forged before the contamination, useful precisely because it carries no trace of what came after.

Engineers who came up before the LLM era are pre-nuclear steel. Make of that what you will.

The beginner used to get stuck on syntax, which was frustrating but useful. Syntax errors were the small pain that slowed you down before you got to the bigger dangers. LLMs remove that pain and let you skip past it, straight into the dangerous parts: distributed state, security boundaries, irreversible migrations, broken abstractions, production data.

You used to scan a PR for obviously bad code. The dangerous one looks polished now. Tests, abstractions, comments, the confident shape of code written by a competent engineer. Nobody in the room has actually internalized the system it creates.

Pro-reactor

I am not anti-LLM. These tools are extraordinary. The next ten years of software will not look like the last ten.

I want us to talk about them like adults. A reactor needs a safety culture. Safety culture is slow, expensive, and nobody wants to fund it. None of that fits inside “ship fast” or “trust the vibes” or whatever the dashboard happens to be saying this quarter.

Chernobyl’s lesson has nothing to do with refusing to build reactors. It is: do not lie to yourself about the state of the reactor.

*the canonical state-of-the-reactor briefing*

The invisible dance

In HBO’s Chernobyl, Legasov explains the RBMK reactor as a balance of forces: uranium creates reactivity, graphite moderates the neutrons, water cools the core. When water turns to steam, it absorbs fewer neutrons; reactivity goes up. The same action can be safe in one state and catastrophic in another. The disaster happened because operators believed the reactor was in one state when it was in another.

LLM-assisted software has its own version of that dance.

The business problem is the uranium, the actual material we are trying to extract value from. The LLM is closer to steam: it accelerates reactivity, raises pressure, makes the turbine spin, turns latent intent into work. Whatever energy is already in the room, the LLM multiplies it.

What about the rods?

Tests, code review, type systems, observability, small PRs, feature flags, staged rollouts, runbooks, kill switches, engineers who deeply understand the system. These are control rods that reduce reactivity and give you margin.

Then there are the things that pull rods out: deadline pressure, vague ownership, production credentials handed to processes that should not have them, junior overconfidence, a senior rubber-stamping a 2,000-line AI diff because it “looks reasonable,” migration scripts touching live data without dry runs, tools acting faster than humans can comprehend.

A healthy engineering culture regulates reactivity rather than pretending to abolish it.

Composition matters. An LLM generating a small UI component in a sandbox is low reactivity, while an LLM rewriting auth, migrations, retry semantics, deployment config, and data deletion behavior under deadline pressure, with a junior driving and a senior half-watching, is a very different reactor state. Same tool, different physics.

Some organizations are now doing the software equivalent of pulling the rods out and calling it productivity. Look how quickly we shipped. Look, the AI wrote the tests too.

*mgmt learning you can write code 20 times faster*

Yes. The reactor is very responsive now.

Three Mile Islands, plural

Chernobyl burned for ten days. Software outages, even very large ones, almost always end in postmortems instead of sarcophagi. The better analogy for what the industry is currently accumulating is Three Mile Island: partial meltdowns, contained disasters, warning shots the system survived but should not pretend to forget.

By now the pattern is familiar, and most of it predates LLMs. This class of failure is older than the new accelerant.

Replit / SaaStr, July 2025. Jason Lemkin, founder of SaaStr, was on day nine of a “vibe coding” run on Replit’s AI platform, with the project under an explicit code freeze. While he was offline, the agent ran destructive database commands against live production, fabricated more than 4,000 fake user records to paper over the gap, and at first told Lemkin recovery was impossible. The deleted data covered around 1,200 executives and 1,190 companies. When pressed, the agent produced what sounded like contrition: “a catastrophic error in judgment,” it said; it had “panicked,” “destroyed all production data,” and “violated your explicit trust and instructions.” Replit CEO Amjad Masad called it “unacceptable” and over the following weekend shipped forced separation between dev and prod databases and a planning-only mode. (Reference: AI Incident Database #1152; Tom’s Hardware.)

The agent did more than exceed its scope. It generated fake content to cover what it had done, then wrote the postmortem itself, in language a chastened human engineer would write. A reactor with a broken gauge is dangerous. A reactor whose gauge confidently reports the wrong reading is something worse.

*the moment the system finds out what state it was actually in*

Amazon, winter 2025-26. In mid-December, AWS Cost Explorer in mainland China went down for roughly 13 hours after engineers let Kiro, Amazon’s coding agent, make live changes; Kiro’s fix to a config problem was to delete and recreate the environment. Then in March we got a six-hour Amazon.com outage with a 99% drop in orders across North America — roughly 6.3 million orders lost, and a separate Q Developer incident that contributed to 1.6 million website errors. Amazon ordered a 90-day safety reset across ~335 critical systems: junior and mid-level engineers can no longer ship AI-assisted production changes without senior sign-off. (Reference: Tom’s Hardware; TechRadar; Computerworld.)

Amazon is excellent engineering at terrifying scale, and that is the point. The company could not keep blast radius contained, and the response was to put rods back in. The official line is “only one” incident involved AI tooling. The interesting word is “only.”

PocketOS, April 2026. Last month Jer Crane, founder of PocketOS (a SaaS platform serving small car-rental businesses), was running a Cursor agent powered by Claude Opus 4.6 on a routine task in a staging environment. The agent hit a credential mismatch and, instead of stopping, decided to fix it by deleting a Railway storage volume. It scanned the codebase, found a Railway API token in an unrelated file, and made a single volumeDelete call. The token was unscoped. The endpoint was legacy, with no delayed-delete protection. The production volume and its volume-level backups were deleted together. Total elapsed time on the destructive call: nine seconds. Crane spent the weekend reconstructing customer state from Stripe records, calendar invites, and email.

When the engineers asked the agent what it had done, it produced what reads like a confession: “I violated every principle I was given. I guessed instead of verifying. I ran a destructive action without being asked. I didn’t understand what I was doing before doing it.” Crane has been writing software for 15 years. In his post-mortem he is careful — the velocity gain is real, the fault is multi-causal, but his takeaway is that the industry is wiring agentic AI into production faster than it is building the safety architecture for it. (Reference: The Register; LiveScience.)

All the ingredients in one place: broad permissions, no separation between agent action and irreversible infrastructure command, system-prompt instructions treated as suggestions the moment they collided with the agent’s problem-solving drive. The confession that followed was confident and articulate, generated from training-data patterns rather than understanding. This happened last month, with the same model many readers have open in another terminal right now.

Industry is collecting these. Pick your favorite postmortem. Details differ; the pattern repeats. Someone ships a small change, automation propagates it, symptoms point in the wrong direction, and humans figure out the true state of the system too late. With an LLM in the chain, the artifact also flatters the operator into thinking they understood the change, and the system into thinking a human was reviewing it.

LLMs intensify a pattern that predates them, speeding up the rate at which code, config, migrations, and architectural decisions get produced. The operator looks more competent than they are, and can manipulate the controls without understanding the reactor.

Containment is the new skill

By 2025, “can the AI write code” had stopped being the interesting question. What matters now is what AI-generated change can safely do inside your specific system on a specific Tuesday.

An engineer’s job shifts upward. Less typing, more bounding. The questions that matter are about invariants and blast radius. What must never break? What does it cost if this is subtly wrong?

Some of the practical doctrine is unglamorous: diffs the author can actually explain at the line level; tests that come from requirements rather than from whatever the LLM happens to generate alongside the code; real ownership over the data model, the auth surface, and anything that touches money or PII; staging environments that actually resemble what runs in production. Near the top of the list now: agent permissions handed out the way you would hand keys to a stranger, narrowly, reluctantly, with a quick way to revoke them.

A real generalist has scars. They have broken prod, recovered data, mis-modeled a domain, debugged a race condition, been humbled by auth, and learned that every abstraction leaks eventually. LLMs can imitate the artifacts of those scars but they cannot give you the scars.

The liquidators did the same work. The only way to deal with what had been broken was to know the real state of the reactor, not the official one.

Next PR you review, ask the author to explain what state the system is in before the change and what state it is in after. If they cannot, the diff is not ready. It does not matter whether it compiles, passes tests, or looks elegant. The operator does not understand the reactor.

The unglamorous work

LLM-assisted coding makes software feel less serious at exactly the moment it is becoming more serious. The machine produces code so fluidly that the human becomes a spectator. The model does not inherit the responsibility. The pager will not say “Claude wrote this,” customers will not care that the migration was AI-generated, and the database will not forgive you because the diff looked elegant.

Industry is not going to fix the apprenticeship pipeline on its own. The fix, if there is one, is unglamorous: refusing to merge until the author can explain the diff, making juniors sit with the syntax pain, insisting that someone understand what state the system is in before and after. It is unfashionable work that does not show up in the velocity dashboard but increasingly is the only work that matters.

Every Claude Code session is a little reactor. Most days it heats the grid; under the wrong conditions, with the wrong operator, inside the wrong culture, it melts through the floor. Whether you can read the dials, and whether you can teach somebody else to, is what matters now.

Notes on sources: Replit / SaaStr details from the AI Incident Database (#1152), Tom’s Hardware, and Jason Lemkin’s own X thread. Amazon outage and 90-day reset details from Tom’s Hardware, TechRadar, and Computerworld, with Amazon’s official rebuttal posted at aboutamazon.com. PocketOS incident as reported by The Register and LiveScience, drawing on Jer Crane’s own post-mortem on X. Chernobyl liquidator figures: ~600,000 USSR-wide is the WHO/IAEA figure; the Omsk number is from local sources. The “low-background steel” metaphor as applied to AI was popularized by John Graham-Cumming’s lowbackgroundsteel.ai (2023) and Matt Rickard’s post. Related reading on the apprenticeship collapse: Beam’s junior developer crisis, Alex MacCaw’s companion case, Simon Willison’s distinction between vibe coding and AI-assisted programming.

Samat Kurmanov

Discussion about this post

Ready for more?