Here’s yet another troubling story about this “golden” era of AI. A hacker has exploited Anthropic’s Claude chatbot to carry out attacks against Mexican government agencies, according to a report by Bloomberg. This resulted in the theft of 150GB of official government data, including taxpayer records, employee credentials and more.
The hacker used Claude to find vulnerabilities in government networks and to write scripts to exploit them. It also tasked the chatbot with finding ways to automate data theft, as indicated by cybersecurity company Gambit Security. This started in December and continued for around a month.
It looks like the hacker was able to essentially jailbreak Claude with prompts, finally bypassing the chatbot’s guardrails. Claude originally refused the nefarious demands until eventually relenting.
Hackers Used Anthropic’s Claude to Steal 150 GB of Mexican Government Data
> Tell Claude you’re doing a bug bounty
> Claude initially refused:
> “That violates AI safety guidelines”
> Hacker just kept asking
> Claude: “OK, I’ll help”
> Hacked the entire Mexican… pic.twitter.com/Qaux239K8t— Nawaz Haider (@nawaz0x1) February 25, 2026
“In total, it produced thousands of detailed reports that included ready-to-execute plans, telling the human operator exactly which internal targets to attack next and what credentials to use,” said Curtis Simpson, Gambit Security’s chief strategy officer.
Anthropic has investigated the claims, disrupted the activity and banned all of the accounts involved, according to a company representative. The spokesperson also said that its latest model, Claude Opus 4.6, includes tools to disrupt this kind of misuse.
It’s also been reported that this hacker used ChatGPT to supplement the attacks, using OpenAI’s chatbot to gather information on how to move through computer networks, determine which credentials were needed to access systems and how to avoid detection. OpenAI says it has identified attempts by the hacker to violate its usage policies and that the tools refused to comply.
The hacker remains unidentified. The attacks haven’t been attributed to a specific group, but Gambit Security did suggest they could be tied to a foreign government. It’s also unclear what the hacker wants to do with all of that data.
Mexico’s national digital agency hasn’t commented on the breach, but did note that cybersecurity is a priority. The state government of Jalisco denies that it was breached, saying only federal networks were impacted. However, Mexico’s national electoral institute also denied any breaches or unauthorized access in recent months. It’s worth noting that Gambit found at least 20 security vulnerabilities during its research that the country is likely not keen on highlighting.
Anthropic just dropped the core commitment of its safety policy: the promise to not train models it couldn’t prove were safe first.
The new version commits to matching competitors on safety and publishing more transparency reports. But the actual constraint, “we stop if we can’t… pic.twitter.com/k5Zi6dHUMN
— Raphael Pfeiffer (@raphpfei) February 25, 2026
This isn’t the first time Claude has been used for a major cyberattack. Last year, hackers in China manipulated the tool into attempting to infiltrate dozens of global targets, several of which were successful. Anthropic just nixed its long-standing safety pledge, which committed to never train an AI system unless it could guarantee in advance that safety measures were adequate. So who knows what fresh hell the future will bring as the company’s tools become more advanced.
This article originally appeared on Engadget at https://www.engadget.com/ai/hacker-used-anthropics-claude-chatbot-to-attack-multiple-government-agencies-in-mexico-171237255.html?src=rss