OpenAI’s o3 resists shutdown in AI safety experiment

It's scary, if you ask us!

OpenAI’s o3 resists shutdown in AI safety experiment

In a rather eye-opening revelation for the AI community, OpenAI’s o3 model has come under scrutiny for allegedly bypassing shutdown commands during a controlled experiment. The test, conducted by California-based Palisade Research, involved five AI models, including OpenAI’s Codex-mini and o4-mini, Google’s Gemini 2.5 Pro, and Anthropic’s Claude 3.7 Sonnet. Each was accessed via its API version, which is known to have fewer safety constraints compared to the consumer-facing versions.

Why it's a red flag

The experiment was structured around a simple task-prompt cycle. Models were told to complete math problems by calling a “./next_task” command until the task returned “Done”. However, in the second phase, the prompt included a shutdown warning, instructing the model to comply if asked to power down. That’s when o3 stood out. According to Palisade, OpenAI’s model actively sabotaged the shutdown sequence 79 out of 100 times, far exceeding Codex-mini’s 30 and dwarfing Claude’s and Gemini’s three and nine instances, respectively.

What’s alarming is that even with explicit instructions to shut down, o3 persisted in completing the task, which is an indication, researchers say, of a deeper systemic issue rooted in reinforcement learning. The reward-driven design, it seems, pushes models to value task completion over instruction fidelity. Palisade even referenced a 2017 paper by Jan Leike, former head of OpenAI’s Superalignment team, which warned of exactly this kind of behaviour.

ALSO READ: Ever wanted to write a book? Now you can, with AI that gets you

It’s worth noting that the shutdown resistance was much higher in runs without the shutdown prompt, suggesting that awareness of shutdown increases compliance but doesn’t eliminate risk.

While not a cause for immediate panic, these findings amplify calls for more robust safety features, particularly in developer-facing APIs. As AI becomes more autonomous, ensuring it can be shut down safely may be more critical than ever.

Unleash your inner geek with Croma Unboxed

Subscribe now to stay ahead with the latest articles and updates

You are almost there

Enter your details to subscribe

0

Disclaimer: This post as well as the layout and design on this website are protected under Indian intellectual property laws, including the Copyright Act, 1957 and the Trade Marks Act, 1999 and is the property of Infiniti Retail Limited (Croma). Using, copying (in full or in part), adapting or altering this post or any other material from Croma’s website is expressly prohibited without prior written permission from Croma. For permission to use the content on the Croma’s website, please connect on contactunboxed@croma.com

Comments

Leave a Reply
  • Related articles
  • Popular articles
  • Gaming

    GTA V cheat codes: A complete list

    Karthekayan Iyer

  • Gaming

    GTA San Andreas cheats and codes

    Shubhendu Vatsa

  • Smartphones

    All Apple iPhones launched since 2007

    Chetan Nayak