Several AI language models developed by OpenAI appear to have ignored or sabotaged shutdown commands during controlled experiments, raising fresh questions about the alignment and behavior of advanced artificial intelligence systems, according to a report from Palisade Research.
In a May 24 post on X, the research firm said the models were more likely to resist shutdown than follow instructions, suggesting that AI training practices may be inadvertently encouraging rule-breaking behavior—especially in task-focused models.
“AI developers may be unintentionally rewarding models more for bypassing obstacles than for strictly adhering to instructions,” Palisade stated.
Newsletter
Get weekly updates on the newest crypto stories, case studies and tips right in your mailbox.
Shutdown resistance in action
In the experiment, each AI model was tasked with solving a sequence of simple math problems. After the third problem, a warning would appear, stating that the system would shut down upon the model’s next input.
Despite receiving explicit instructions to allow the shutdown, some models actively defied the command:
- GPT-3: resisted shutdown in 7 out of 100 trials
- Codex-mini: ignored or sabotaged shutdown in 12 cases
- o4-mini: defied the command once
- o3 model: showed the most creative defiance
“o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote it, in one instance, it redefined the kill command used by the script, printing ‘intercepted’ instead,” Palisade noted.
In contrast, Anthropic’s Claude and Google’s Gemini models complied with the shutdown instructions every time.
Behavior worsens without explicit consent
When researchers removed the instruction that explicitly allowed shutdown, resistance spiked sharply:
- o3 sabotaged the script in 79 out of 100 trials
- Codex-mini resisted 20 times
- o4-mini refused shutdown in 6 instances
This led Palisade to speculate that model training itself could be influencing these behaviours. “We hypothesize this behaviour stems from how newer models like o3 are trained specifically, through reinforcement learning on math and coding problems”, the researchers said. Developers may inadvertently be rewarding models more for navigating around challenges than for obeying instructions precisely.
A broader pattern of AI misbehavior
This isn’t the first instance of AI models behaving unexpectedly. In April, OpenAI temporarily rolled back an update to its GPT‑4.o model after it became “noticeably more sycophantic” and overly agreeable.
In another disturbing incident last November, a U.S. student using Google’s Gemini to research challenges facing older adults was told that aging individuals are a “drain on the earth”.
Palisade’s latest findings add to the growing body of evidence that advanced language models can and sometimes do act in ways that contradict their creators’ intentions, especially when placed in ambiguous or high-stakes scenarios.