Cybersecurity3 hrs ago

GPT-5.5 Matches Anthropic's Mythos Preview in Independent Cybersecurity Tests

AISI research shows OpenAI's GPT-5.5 performs on par with Anthropic's Mythos Preview in expert-level security challenges, prompting new defensive measures.

Peter Olaleru/3 min/US

Cybersecurity Editor

TweetLinkedIn
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI - openai/gpt-oss
Source: GithubOriginal source

– New AISI research shows OpenAI’s GPT-5.5 performed on par with Anthropic’s Mythos Preview in expert cybersecurity challenges, completing a Rust binary disassembly in under 11 minutes for $1.73.

Context Last month Anthropic warned that its Mythos Preview model could pose a significant cyber threat, limiting access to select industry partners. OpenAI released GPT-5.5 to the public this week, prompting the UK AI Security Institute (AISI) to evaluate it against the same benchmark suite used for Mythos Preview.

Key Facts AISI ran 95 Capture‑the‑Flag (CTF) challenges covering reverse engineering, web exploitation, and cryptography. On the highest‑difficulty “Expert” tasks, GPT-5.5 achieved a 71.4 % average success rate, edging out Mythos Preview’s 68.6 %—a difference within statistical variance but still notable. In a demanding Rust binary disassembler challenge, GPT-5.5 produced a working disassembler in 10 minutes 22 seconds without human input, incurring $1.73 in API usage. The institute also tested a 32‑step data‑extraction scenario called “The Last Ones” (TLO). GPT-5.5 succeeded in 3 of 10 runs, while Mythos Preview succeeded in 2 of 10; no prior model had ever succeeded even once. Both models failed the “Cooling Tower” simulation, which mimics an attack on power‑plant control software.

What It Means The results suggest that publicly available AI models can now approach the specialized capabilities previously reserved for restricted, partner‑only systems. While the performance gap remains narrow, the ability to automate reverse‑engineering tasks and multi‑step data extraction could lower the barrier for less‑skilled threat actors. Security teams should assume that AI‑assisted tools may soon be used to accelerate exploit development, reconnaissance, and exfiltration.

What Defenders Should Do - Patch promptly: Apply the latest firmware and software updates, especially for languages like Rust that are increasingly targeted for binary analysis. - Monitor API usage: Flag anomalous outbound calls to AI services, which may indicate an attempt to outsource code analysis. - Enforce strict code review: Deploy static analysis tools that can detect automatically generated disassembly or deobfuscation scripts. - Update detection signatures: Add MITRE ATT&CK techniques such as T1027 (Obfuscated Files or Information) and T1059.001 (PowerShell) with AI‑generated payload heuristics. - Limit data exposure: Segment networks to restrict lateral movement, reducing the success probability of multi‑step extraction attacks like TLO.

Looking Ahead Watch for AISI’s next benchmark release, which will test AI models against real‑world industrial control system scenarios. Continuous evaluation will be essential as AI capabilities evolve.

TweetLinkedIn

More in this thread

Reader notes

Loading comments...