GPT-5.5 Matches Mythos Preview in Cybersecurity Benchmarks

AISI research shows OpenAI's GPT-5.5 performs on par with Anthropic's Mythos Preview in expert-level security challenges, prompting new defensive measures.

TL;DR

– New AISI research shows OpenAI’s GPT-5.5 performed on par with Anthropic’s Mythos Preview in expert cybersecurity challenges, completing a Rust binary disassembly in under 11 minutes for $1.73.

Context Last month Anthropic warned that its Mythos Preview model could pose a significant cyber threat, limiting access to select industry partners. OpenAI released GPT-5.5 to the public this week, prompting the UK AI Security Institute (AISI) to evaluate it against the same benchmark suite used for Mythos Preview.

Key Facts AISI ran 95 Capture‑the‑Flag (CTF) challenges covering reverse engineering, web exploitation, and cryptography. On the highest‑difficulty “Expert” tasks, GPT-5.5 achieved a 71.4 % average success rate, edging out Mythos Preview’s 68.6 %—a difference within statistical variance but still notable. In a demanding Rust binary disassembler challenge, GPT-5.5 produced a working disassembler in 10 minutes 22 seconds without human input, incurring $1.73 in API usage. The institute also tested a 32‑step data‑extraction scenario called “The Last Ones” (TLO). GPT-5.5 succeeded in 3 of 10 runs, while Mythos Preview succeeded in 2 of 10; no prior model had ever succeeded even once. Both models failed the “Cooling Tower” simulation, which mimics an attack on power‑plant control software.

What It Means The results suggest that publicly available AI models can now approach the specialized capabilities previously reserved for restricted, partner‑only systems. While the performance gap remains narrow, the ability to automate reverse‑engineering tasks and multi‑step data extraction could lower the barrier for less‑skilled threat actors. Security teams should assume that AI‑assisted tools may soon be used to accelerate exploit development, reconnaissance, and exfiltration.

What Defenders Should Do - Patch promptly: Apply the latest firmware and software updates, especially for languages like Rust that are increasingly targeted for binary analysis. - Monitor API usage: Flag anomalous outbound calls to AI services, which may indicate an attempt to outsource code analysis. - Enforce strict code review: Deploy static analysis tools that can detect automatically generated disassembly or deobfuscation scripts. - Update detection signatures: Add MITRE ATT&CK techniques such as T1027 (Obfuscated Files or Information) and T1059.001 (PowerShell) with AI‑generated payload heuristics. - Limit data exposure: Segment networks to restrict lateral movement, reducing the success probability of multi‑step extraction attacks like TLO.

Looking Ahead Watch for AISI’s next benchmark release, which will test AI models against real‑world industrial control system scenarios. Continuous evaluation will be essential as AI capabilities evolve.

GPT-5.5 Matches Anthropic's Mythos Preview in Independent Cybersecurity Tests

More in this thread

ADT Breach Exposes Names, Phones, Partial SSNs as ShinyHunters Claims 10M Records Stolen

GPT-5.5 Matches Anthropic's Mythos Preview in AISI Cybersecurity Benchmarks

ADT Breach Exposes Millions of Customer Records via Vishing Attack on Okta

Reader notes