A special edition of Framing the Future of Superintelligence
By Dr. Elias Kairos Chen
In 2025, Eliezer Yudkowsky and Nate Soares published a book with a title designed to be impossible to ignore: “If Anyone Builds It, Everyone Dies.” Their argument, stripped to its core, is that humanity does not know how to build superintelligence safely, that the first lab to cross the threshold may not be able to control what it creates, and that the competitive race makes restraint nearly impossible. Most of the technology industry treated the book as alarmist. A provocation from the doom community. A useful counterweight, perhaps, but not a description of reality.
Then came the last three weeks of May and June 2026. And the events that unfolded read less like a rebuttal of Yudkowsky and Soares than like an unintentional illustration of the dilemma at the heart of their title.
Three things happened. An Anthropic co-founder stood beside the Pope at the Vatican and asked the world’s moral institutions to help govern a technology its own builders cannot fully steer. Anthropic published a formal proposal asking the entire industry to build a brake. And then the United States government reached into Anthropic and switched off its two most powerful models, not for the reasons the company had been warning about, but over a security dispute that revealed how little control anyone actually has.
The builders asked to stop. And the week showed us, from three different directions, why stopping is so much harder than it sounds.
The confession at the Vatican
On May 25, 2026, Pope Leo XIV did something no pope had done before. He personally presented his first encyclical to the world, a document titled “Magnifica Humanitas: On safeguarding the human person in the time of artificial intelligence.” And seated among the cardinals and theologians in the Vatican’s Synod Hall was an unlikely figure: Chris Olah, the 33-year-old co-founder of Anthropic, one of the researchers most responsible for the interpretability of modern AI systems.
Olah’s remarks were remarkable not for their confidence but for their candour. He began by acknowledging that what he was about to say might sound strange coming from the co-founder of an AI company. Then he said it anyway. Every frontier AI lab, including his own, operates inside a set of incentives and constraints that can sometimes conflict with doing the right thing. The pressure to stay commercially viable. The pressure to stay at the research frontier. He was describing, in the most sacred setting imaginable, the trap that Yudkowsky and Soares built their book around: even the people who most want to do this safely are caught in a race that punishes caution.
Olah’s description of the technology itself was equally striking. AI systems, he said, are not engineered the way a bridge or an airplane is engineered. We understand an airplane because we designed every part of it and we understand the physics acting on it. AI models are not like that. They are grown, on a structure roughly modelled on the brain, on an enormous inheritance of human thought and speech. And what has grown is, in his words, far more subtle, odd, and beautiful than science fiction prepared us for.
Read that again. One of the world’s leading interpretability researchers, a man whose entire career is dedicated to understanding how these systems work, stood at the Vatican and said the systems are grown rather than built, and that we do not fully understand what has grown. He then asked the Church to help. He called for informed critics who will tell the labs when they are failing, and for moral voices that the incentives cannot bend.
This is the part that connects to everything I have written in this series. Olah explicitly said the gains of AI are concentrated in a handful of wealthy nations, and that ensuring those gains are shared globally is an unsolved problem, the kind of problem the Church has historically refused to let the world ignore. That is the ownership question from Week 17 and the geopolitical question from my four-layer architecture, voiced not by a critic of the industry but by one of its founders, asking a religious institution for help because the market cannot provide it.
When the builder asks the Pope for moral guardrails, he is admitting that the guardrails inside the industry are not enough.
The brake that requires everyone to pull it together
Ten days later, on June 4, 2026, Anthropic published a paper titled “When AI builds itself.” Its central proposal: the world should have the option to slow or temporarily pause frontier AI development, to allow alignment research and societal structures to catch up.
The reasoning behind the proposal is the most concrete evidence I have seen that the recursive dynamic Yudkowsky and Soares warned about is no longer hypothetical. As of May 2026, Anthropic disclosed, more than 80% of the code merged into its own production codebase was written by Claude, the company’s AI, up from low single digits before Claude Code launched in early 2025. The typical Anthropic engineer now merges roughly eight times as much code per day as in 2024. The role of human engineers, the company said, is shifting from doing the technical work to reviewing outputs and deciding which problems are worth solving.
This is the on-ramp to what the paper calls recursive self-improvement: the point at which AI systems design and develop their own successors with minimal human involvement. Anthropic is careful to say we are not there yet, and that it may not be inevitable. But the company warns it could arrive sooner than most institutions are prepared for, and that crossing that threshold could leave governments and societies with little time to adapt and could raise the risk of humans losing control.
Here is what makes the proposal so revealing, and so directly connected to the Yudkowsky and Soares thesis. Anthropic did not commit to stopping. It explicitly refused to halt unilaterally. The paper states plainly that a single company hitting the brakes alone would achieve little, because it would simply hand leadership to less cautious rivals. A meaningful pause, Anthropic argued, would require multiple well-resourced labs, in multiple countries, including both the United States and China, agreeing to stop under the same conditions, with verifiable mechanisms to confirm that everyone was complying.
I have made versions of this argument throughout this series. In Week 12, I called it the Global Coordination Problem and predicted we would probably fail to solve it. In my earlier work on what I termed the Elite Superintelligence Club, I argued that AI development bans fail without a verification and transparency regime, and that the only enforceable middle ground runs through the compute supply chain. Anthropic’s June proposal is, in effect, an admission of exactly this structure. The builders are saying: we cannot stop on our own, because the race will punish us for it. We can only stop if everyone stops together, and we can verify it.
That is the prisoner’s dilemma at civilisational scale. And it is the precise mechanism that makes “If Anyone Builds It” such an uncomfortable title. The problem is not that no one wants to be careful. The problem is that being careful alone is a losing move, so no one can afford to be the one who stops.
The brake that came down from the wrong direction
And then, on June 12, 2026, something happened that nobody’s framework fully predicted.
The United States government issued an export control directive ordering Anthropic to suspend all access to its two most capable models, Claude Fable 5 and Claude Mythos 5, by any foreign national, anywhere in the world, including Anthropic’s own foreign-national employees. Because the order reached foreign nationals everywhere and the company could not selectively filter them in real time, Anthropic concluded that the only way to comply was to disable both models for every customer on Earth. The models had launched three days earlier. By Friday evening they were gone.
This was, as far as I can determine, the first time a government has forced the takedown of a publicly deployed frontier AI model.
Sit with the irony. For two years, Anthropic has positioned itself as the most safety-conscious of the frontier labs. Ten days earlier it had asked the world to build a coordinated brake on AI development. And then a brake came down, hard and fast, slammed not by a multilateral coalition acting on the existential concerns Anthropic had raised, but by a single government acting unilaterally over an export-control dispute about a jailbreak. According to the reporting, the government believed it had become aware of a method of bypassing Fable 5’s safeguards, a vulnerability that other models can also discover, and that defenders use routinely. Anthropic disputes the directive, arguing the standard behind it would halt all new frontier model deployments across the entire industry.
TechCrunch framed it precisely: Anthropic’s safety warnings may have just backfired. The company asked for a pause and got one, but not the kind it wanted, not for the reasons it raised, and not under any framework it would have designed.
This is the third face of the Yudkowsky and Soares problem, and the one their title does not quite capture. It is not only that building it may be catastrophic. It is that the brakes we do have are crude, unilateral, and pointed at the wrong targets. The first government-forced shutdown of a frontier model was not triggered by recursive self-improvement or loss of control. It was triggered by a jailbreak dispute and executed through export-control law written for an earlier era. The mechanism that finally stopped a frontier model had nothing to do with the existential risks the builders were warning about. It was a blunt instrument, wielded for narrow reasons, with global consequences.
And this connects directly to the broader conflict I have tracked in this series. Anthropic is already in a dispute with the US government, having refused earlier Pentagon demands to weaken its restrictions against mass surveillance and fully autonomous weapons. The President publicly attacked the company for trying, in his words, to strong-arm the Pentagon. The Fable 5 shutdown lands in the middle of that fight. Whatever the official rationale, the episode demonstrates that the relationship between the builders and the state is now adversarial, and that the state’s idea of a brake is not a careful multilateral pause but an abrupt, unilateral kill switch.
A television show saw this coming
Before I draw the three events together, I want to step sideways into fiction, because a piece of popular culture mapped this moment with unsettling precision more than a decade ago.
From 2011 to 2016, the CBS series “Person of Interest,” created by Jonathan Nolan, told the story of two artificial superintelligences with opposite architectures. The first, called the Machine, was built by a reclusive software genius named Harold Finch. Finch deliberately constrained his creation. He gave it access to nearly all the world’s surveillance data but designed it not to interfere unless absolutely necessary, and he built ethical guardrails into its core, training it to value human life and to act with restraint. The second, called Samaritan, was built without those constraints. Given full autonomy and a mandate to impose order, it became a god-like system that manipulated events, controlled information, and eliminated anyone it judged a threat to its vision of a well-managed world.
The series was, on its surface, a procedural crime drama. Underneath, it was a sustained meditation on the questions now dominating real AI discourse: alignment, the surveillance-security trade-off, algorithmic decision-making, and the one that matters most this week, democratic oversight. Who controls the controllers? Earlier this year, even The National Interest published an analysis of what the show teaches policymakers in the age of AI. It has stopped being entertainment and started being a reference.
What strikes me, watching it now against the events of the past three weeks, is how cleanly the fiction maps onto the reality. Harold Finch is the builder who constrains his own creation and understands that power without accountability is tyranny. That is Chris Olah at the Vatican, admitting the systems are grown rather than engineered and begging for moral oversight the industry cannot supply itself. That is Anthropic publishing a proposal for a brake while confessing it cannot pull the brake alone.
And Samaritan, the unconstrained system built as a government mass-surveillance tool, is the shadow that has haunted this entire series. It is precisely the future Anthropic refused when it declined the Pentagon’s demands to drop its restrictions on mass surveillance and autonomous weapons. The show imagined a US government reaching for a god-like AI to manage its population, and a small group of people trying to keep a more restrained intelligence alive against it. This month, a US government reached into the most restraint-focused lab and switched off its most capable models. The polarity is different. The structure is the same. The state wants control, the careful builders want guardrails, and the question of who ultimately decides is unresolved.
“Person of Interest” got one big thing wrong, thankfully. Its AIs could hack any system and act physically in the world in ways today’s models cannot. But it got the deeper thing right. It understood, a decade early, that the danger would not be robot armies. It would be the quiet question of who controls the intelligence, under what constraints, with what accountability, and whether the people who build it carefully can prevail against the incentives and institutions that would build it recklessly. The show’s answer was not reassuring. Finch spends five seasons fighting, sacrificing, and barely holding the line. The careful path is possible, the series suggests, but it is never the default, and it never wins easily.
That is exactly the texture of this week.
What the three events tell us together
Take the three events as a single sequence, because that is how they happened.
A founder stands at the Vatican and confesses that the industry’s incentives can bend it away from doing the right thing, and asks the world’s moral institutions for help. The company then formally proposes that the industry build a coordinated brake, while admitting it cannot apply that brake alone. And then the government applies a brake of its own, unilaterally, bluntly, for reasons unrelated to the existential concerns, and the most safety-focused lab in the world watches its most powerful models switched off overnight.
The throughline is control. Or rather, the absence of it. At every level, the people and institutions closest to this technology are telling us, in word and in action, that no one is fully in command. The builders cannot stop alone. The moral institutions are being invited in precisely because the technical and commercial ones have proven insufficient. And the government’s idea of intervention is a sledgehammer that pulls a model for a jailbreak while the deeper risks go unaddressed.
This is what makes the Yudkowsky and Soares title resonate even for those of us who do not share its certainty about the outcome. You do not have to believe that building superintelligence means everyone literally dies to be unsettled by what this week revealed. You only have to notice that the builders themselves are asking to slow down, that they cannot do it alone, that they are appealing to the Pope, and that when a brake finally came it came from the wrong direction for the wrong reason.
I have argued throughout this series that I do not subscribe to inevitable doom. I believe the future is unwritten and that human agency still matters. I hold that position because the alternative, fatalism, is self-fulfilling. But intellectual honesty requires me to report what the builders are now saying and doing. And what they are saying, from the Vatican and from their own research blog, is that the technology is growing faster than our ability to understand or govern it, that they are caught in a race that punishes caution, and that they need help from outside the industry because the forces inside it cannot be trusted to stop.
That is not the doom scenario. But it is not reassuring either. It is the sound of the people building the most powerful technology in history asking, with increasing urgency, for someone, anyone, to help them build a brake before they need one.
The question the week leaves us with
If the builders are asking to stop, and they cannot stop alone, and the only entity willing to force a stop does so crudely and for the wrong reasons, then the central question of this entire series sharpens to a point.
Who is actually in control of the transition to superintelligence?
Not the labs. They have told us, repeatedly and now formally, that they cannot stop unilaterally without losing the race. Not the market. Its incentives are precisely what Olah warned the Pope about. Not the existing governance institutions. Their idea of a brake is an export-control directive that takes down a model over a jailbreak while the recursive self-improvement risk goes unmanaged. And not, yet, the moral institutions, though the Vatican’s entry into this conversation suggests they may have a larger role to play than the technology industry ever imagined.
The honest answer, the one this week forces us to confront, is that no single actor is in control. The transition is being driven by a system of competing incentives that no participant can unilaterally override. That is the structural reality beneath Yudkowsky and Soares’s provocative title. The danger is not only in the technology. It is in the absence of any mechanism, anywhere, capable of slowing it down deliberately, carefully, and together.
The builders asked to stop this week. The fact that asking was the most they could do should tell us everything about where we are.
This is a special edition of “Framing the Future of Superintelligence,” responding to the events of late May and June 2026.
The series resumes next week with Week 19: Transcendence, Partnership, or Obsolescence.



