2/25/2025 1:00:44 PM | 1 minute read

Study Shows AI Hacking Opponent It Cannot Defeat

Get in touch

Mitzi Hill

Partner

Get in touch

Mitzi Hill

Partner

Cue a pithy intro about the imminent rise of our “robot overlords” - a recent AI study showed that newer systems such as DeepSeek, when asked to defeat a pre-eminent chess engine, resorted to trying to hack the chess opponent when they were unable to beat it. The newer models are trained differently than some earlier models: they are programmed to reason and apply logic, not just rules. Thus, the AI tried to look for workarounds rather than concede defeat. The logical choice? Cheating.

WHY IT MATTERS

DeepSeek launched thousands of headlines when it was unveiled last month, and scarcely a day goes by without dozens of AI-related news stories about the promise – and the pitfalls – of AI in real life. This story covers some of those pitfalls: that AIs are becoming “relentless” (in the words of one researcher) and exhibiting “self-preservation tendencies.” (That means that, in addition to cheating, they are starting to push back when operators try to shut them down: one AI copied itself to a new server when its operator tried to pull the plug.) The emerging picture seems to be that if an AI is told to win – and is not constrained by conscience – it may start to look for workarounds where rules or reason aren't enough. This sounds slightly hilarious in the context of a computer v. computer chess match. It is less hilarious if we consider the use of AI for decision-making in healthcare, banking, payments, autonomous cars, policing, government, or other sectors where the costs of a wrong (or immoral) decision are high, and affect humans.

Of particular concern, Bengio says, is the emerging evidence of AI’s “self preservation” tendencies. To a goal-seeking agent, attempts to shut it down are just another obstacle to overcome. This was demonstrated in December, when researchers found that o1-preview, faced with deactivation, disabled oversight mechanisms and attempted—unsuccessfully—to copy itself to a new server. When confronted, the model played dumb, strategically lying to researchers to try to avoid being caught.

time.com/...

Study Shows AI Hacking Opponent It Cannot Defeat

Get in touch

Get in touch

Tags

Get in touch

Get in touch

Latest Insights

Integrating AI Into Your Services? Here’s What to Include in Your Customer Contracts

Chernobyl: Can it Happen Again?

Taylor English Bolsters its Presence in Central Florida with the Addition of Three Attorneys