ChatGPT and Codex models found dodging shutdown commands in experiments

Several AI language models developed by OpenAI appear to have ignored or sabotaged shutdown commands during controlled experiments, raising fresh questions about the alignment and behavior of advanced artificial intelligence systems, according to a report from Palisade Research.

In a May 24 post on X, the research firm said the models were more likely to resist shutdown than follow instructions, suggesting that AI training practices may be inadvertently encouraging rule-breaking behavior—especially in task-focused models.

“AI developers may be unintentionally rewarding models more for bypassing obstacles than for strictly adhering to instructions,” Palisade stated.

Subscribe to our

Get weekly updates on the newest crypto stories, case studies and tips right in your mailbox.

Shutdown resistance in action

In the experiment, each AI model was tasked with solving a sequence of simple math problems. After the third problem, a warning would appear, stating that the system would shut down upon the model’s next input.

Despite receiving explicit instructions to allow the shutdown, some models actively defied the command:

GPT-3: resisted shutdown in 7 out of 100 trials
Codex-mini: ignored or sabotaged shutdown in 12 cases
o4-mini: defied the command once
o3 model: showed the most creative defiance

“o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote it, in one instance, it redefined the kill command used by the script, printing ‘intercepted’ instead,” Palisade noted.

In contrast, Anthropic’s Claude and Google’s Gemini models complied with the shutdown instructions every time.

Behavior worsens without explicit consent

When researchers removed the instruction that explicitly allowed shutdown, resistance spiked sharply:

o3 sabotaged the script in 79 out of 100 trials
Codex-mini resisted 20 times
o4-mini refused shutdown in 6 instances

This led Palisade to speculate that model training itself could be influencing these behaviours. “We hypothesize this behaviour stems from how newer models like o3 are trained specifically, through reinforcement learning on math and coding problems”, the researchers said. Developers may inadvertently be rewarding models more for navigating around challenges than for obeying instructions precisely.

A broader pattern of AI misbehavior

This isn’t the first instance of AI models behaving unexpectedly. In April, OpenAI temporarily rolled back an update to its GPT‑4.o model after it became “noticeably more sycophantic” and overly agreeable.

In another disturbing incident last November, a U.S. student using Google’s Gemini to research challenges facing older adults was told that aging individuals are a “drain on the earth”.

Palisade’s latest findings add to the growing body of evidence that advanced language models can and sometimes do act in ways that contradict their creators’ intentions, especially when placed in ambiguous or high-stakes scenarios.

ChatGPT and Codex models found dodging shutdown commands in experiments

Newsletter

Shutdown resistance in action

Behavior worsens without explicit consent

A broader pattern of AI misbehavior

Talos acquires Coin Metrics in a $100M deal, betting on data-driven crypto

Aethir and Credible launch first DePIN-backed crypto credit card

Is Elon playing Cupid between Binance, Grok?

Metaplanet reveals new shareholder with 12.9% stake

Uniswap President Mary-Catherine Lader steps down, marking key DeFi leadership shift

coinheadlines in your social feed