Cybersecurity3 hrs ago

Anthropic Investigates Unauthorized Access to Claude Mythos Preview Model

Anthropic investigates unauthorized access to its Mythos AI model after a contractor leak; details on impact, mitigations, and what to watch next.

Peter Olaleru/3 min/GB

Cybersecurity Editor

TweetLinkedIn
Anthropic Investigates Unauthorized Access to Claude Mythos Preview Model
Source: The GuardianOriginal source

Anthropic is investigating a report that fewer than five users accessed its Claude Mythos Preview model through a third‑party contractor environment. The incident occurred on the same day the model was being shared with select partners such as Apple and Goldman Sachs for testing.

Context Anthropic confirmed the probe after Bloomberg reported a private online forum where the users obtained Mythos via a contractor’s credentials. The model is not publicly released because of its demonstrated ability to automate multi‑step cyber‑attack sequences. The UK AI Security Institute previously tested Mythos and found it could complete a 32‑step attack simulation in three out of ten attempts.

Key Facts - The access vector was a compromised contractor account that allowed the users to reach the Mythos preview in Anthropic’s vendor environment. - No evidence shows the group ran malicious prompts; they reportedly “played around” with the model, according to Bloomberg’s screenshots and live demo. - Anthropic has not disclosed any data exfiltration, system disruption, or financial loss tied to the incident. - The model’s capability to autonomously identify IT weaknesses raises concern that unauthorized use could accelerate real‑world attack planning.

What It Means The breach highlights risks in third‑party access controls for high‑risk AI assets. Organizations should treat AI models with dangerous capabilities as privileged resources and apply the same safeguards as for critical infrastructure.

Mitigations - Enforce least‑privilege access and require MFA for all vendor accounts that can reach AI model repositories. - Monitor API and UI usage for anomalous patterns, such as sudden spikes in query volume or atypical prompt sequences (MITRE ATT&CK T1078 – Valid Accounts, T1133 – External Remote Services). - Apply network segmentation to isolate AI development environments from corporate networks. - Review and update vendor risk assessments, including contractual obligations for security controls and incident reporting. - Deploy detection signatures for known abuse patterns, e.g., repeated attempts to trigger multi‑step attack simulations (custom rule based on prompt length and sequencing).

What to watch next Anthropic’s investigation results, any advisory from the UK AI Security Institute, and whether similar leaks emerge in other foundation model providers.

TweetLinkedIn

More in this thread

Reader notes

Loading comments...