Cognitive surrender is a management problem

Engineers can’t calibrate themselves out of a system that rewards the wrong posture.

May 26, 2026

Addy Osmani’s piece on cognitive surrender is the clearest articulation I’ve read of what’s happening to engineers under sustained AI assistance. He builds on a recent Wharton paper by Steven Shaw and Gideon Nave that distinguishes two postures: cognitive offloading (delegating the how, keeping the what, retaining the ability to override) versus cognitive surrender (the model’s output becomes your output, with no independent view to compare against). Across three experiments and 1,372 participants, Shaw and Nave found that simply having an AI available was enough for people to surrender. On trials where the AI was wrong, 73% of the time participants accepted the wrong answer, and their confidence went up regardless of accuracy.

The diagnosis lands. If you’ve approved a 600-line PR after a glance, accepted a debugging fix without understanding the underlying bug, or noticed two weeks later that your mental model of a system has thinning patches you can no longer point to, the piece will feel like a description of something you’ve done.

Where I want to extend Osmani’s argument is on the locus of intervention. His remedies are personal: construct an expectation before reading the output, read the diff like the AI didn’t write it, ask the model to argue against itself, notice when you’re tired. These are good practices and I use several of them. The trouble is that they ask the individual engineer to develop calibration discipline inside a system that rewards the opposite posture. If you’ve read my pieces on the likability tax or the unexamined leader, you know where this argument goes. Personal discipline is what the system asks of you when the system is the problem.

If you’re a manager or director of engineering reading this, the piece is partially about you. Most of what determines whether your team is offloading or surrendering has very little to do with what they read on a Tuesday afternoon. It has to do with what your hiring rubric measures, what your promotion criteria reward, what your one-on-one cadence makes visible, and what your performance calibration treats as evidence of seniority.

Where the signal goes missing

Osmani names this in passing. He observes that “PRs merged, features shipped, tickets closed. None of these distinguish between ‘I built this and understand it’ and ‘the agent built this and I approved it.’ The org rewards both equivalently in the short run.”

Most engineering organizations measure only the short run. Performance reviews happen quarterly, promotion calibrations annually, and the signal those processes consume is throughput-shaped because throughput is the metric that survives translation up the chain to product, finance, and the board. When a director presents engineering productivity to a CFO, the slide says PRs merged. It does not say comprehension preserved.

The surrendering engineer ships more than the offloading one, because forming an independent view costs cognitive time the system isn’t measuring. Six months in, when something breaks, the surrendering engineer’s manager is genuinely surprised by how little their direct report can debug from first principles. The architecture is full of decisions no human made. The mental model that was supposed to be in the team has migrated into the model.

Your calibration system selected for this. The failure is in the measurement, not the engineer.

I wrote about a related pattern in the makers-to-monitors piece, the move from makers to monitors, where engineers approve AI-generated code from their phones during the morning bus ride and the company’s analyst call frames this as evidence of productivity. Nobody on that call asked how the engineers were doing. The throughput metric was clean, the cognitive cost was invisible.

What your interview loop is actually measuring

Walk through what your interview loop assesses. Most engineering interviews measure how fast a candidate produces working code under time pressure. Many have adapted to the AI era by allowing or encouraging tool use, and the ones that haven’t are about to. Either way, the loop reads throughput as a proxy for engineering judgment, and that proxy was always weak. It’s now broken in a specific way: the engineer who produces working code fastest with an agent is a structurally different engineer from the one with the deepest system understanding, and your interview measures the first while your production system needs the second.

The same problem operates inside the company. Engineers who lean hardest on AI tooling ship more visible work. They close more tickets, appear in more pull request threads, become more visible to senior management. When promotion calibration happens, the visible engineer wins the argument, and the engineer doing more careful cognitive work but producing fewer countable artifacts loses it.

Two concrete changes, if you want to break this:

Stop using PR throughput as a primary signal for promotion above L4 or L5. Add an explicit competency for system reconstruction - can the engineer, without AI assistance, walk through the architecture of a service they shipped to in the last quarter? If the answer is no, you have data.

Add a debugging exercise to your senior interview loop where the candidate is given a broken system, told they can use AI assistance, and then asked, after producing a fix, to explain what was wrong from first principles. The explanation is what you’re hiring for.

The one-on-one as diagnostic

The most important channel for detecting cognitive surrender on your team is the weekly one-on-one. Your dashboard won’t show you. Your code review tool won’t show you. The engineer, especially when tired or under pressure, will not necessarily volunteer it.

The questions that surface the difference are specific. “How is the project going?” produces a status report. “Are you using the AI tools effectively?” produces a defensive denial. The questions that work are diagnostic about cognitive ownership: What did you build this week that you understand cold, and what did you build that you’d struggle to explain without pulling up the diff? When you got stuck on the X bug, did you debug it or did the agent? If the symptom came back tomorrow, would you reach for the same fix or do you understand the system better than you did before? What was the last design choice you made where you formed your own view before consulting the model?

The questions feel intrusive at first, and they are. They’re also the only way to read a variable that throughput cannot read. Engineers who are offloading well answer them without difficulty. Engineers who are surrendering answer with subtle evasions: the tests passed, the agent suggested it and it seemed reasonable, I didn’t have time to dig into it. Those phrases are the diagnostic signals. Train yourself to hear them as such.

In empathy is not a proxy metric, I wrote that the felt warmth of a conversation isn’t evidence about the underlying state of the team. The same logic applies here. Your engineer’s confidence about their own work is not evidence about whether they’re surrendering. Shaw and Nave’s finding is precisely that confidence transfers from the model to the human regardless of accuracy. Your felt sense that “they seem to know what they’re doing” is exactly the signal that has been compromised. Verify through specific reconstruction, not through tone.

The junior problem compounds

Osmani cites the Anthropic skill-formation paper showing that engineers who used AI to generate code while learning a new library scored 17% lower on a follow-up comprehension quiz than the control group, while engineers who used AI for conceptual inquiry held their ground.

For managers, the implication runs harder than Osmani draws it. Junior engineers are forming their professional capabilities right now, under conditions that erode skill formation if they use the tools to generate. A 17% comprehension gap isn’t a one-time hit. It compounds. The engineer who learns a system through generation has a shallower mental model than one who learned through inquiry, and that gap will affect every subsequent task in that codebase, for as long as they work on it.

Most teams haven’t thought through this. They onboard juniors by handing them tickets, expecting them to ship, and measuring performance by throughput. The juniors who lean hardest on the agent produce the most visible early output, and they’ll also be the ones whose mental models are thinnest two years in.

A junior engineer cannot opt into being slowed down. The team norm has to permit it, which means a manager has to set it. Scope junior work toward conceptual inquiry during the learning window. Pair them with senior engineers in code review where the question is “explain why this works” rather than “does this look right.” Make the first thirty days of any new system contact deliberately AI-light, not because the tools are bad but because the cognitive scaffolding hasn’t formed yet. After the scaffolding is in place, the tools accelerate. Before it, they undermine.

What to change, in order of leverage

Reweight your promotion rubric so that comprehension produces a measurable signal alongside throughput. Add demonstrated ability to reconstruct system behavior from first principles, and test it during senior promotion calibration. Engineers adjust to new signals within one cycle.

Build an evidence requirement into your code review process. PR descriptions cannot read as “the agent suggested this fix and it works.” They have to articulate what was wrong, why this fix addresses it, and what the engineer learned. Reviewers should refuse PRs that read as ratification rather than understanding.

Run a quarterly audit on systems your team owns. Ask each engineer to whiteboard the architecture, failure modes, and design tradeoffs without consulting the codebase or the agent. The exercise is diagnostic. Engineers who can do it have intact mental models. Engineers who can’t are showing you where the comprehension debt has accumulated.

Track your team’s bus factor under AI conditions. If a senior engineer leaves and the team can’t reconstruct the systems they touched, the agent doesn’t save you. The agent doesn’t have the mental model either. It only had access to the engineer who did.

Practical Tech Leader

Discussion about this post

Ready for more?