Anthropic Withholds Claude Mythos Preview After It Autonomously Discovered Thousands of Zero-Day Vulnerabilities

Summary

Anthropic has made the unprecedented decision to withhold its most advanced AI model, Claude Mythos Preview, from public release after internal testing revealed it could autonomously discover and exploit thousands of previously unknown software vulnerabilities — including decades-old flaws — without human intervention.

The model triggered Anthropic’s ASL-3 safety classification for cybersecurity capabilities, with some reports suggesting it approached ASL-4 thresholds. This marks the first time a major AI lab has refused to ship a frontier model purely on security grounds, signaling a dramatic shift in how the industry thinks about deployment constraints.

Instead of a public launch, Anthropic initiated Project Glasswing on April 7, 2026 — a controlled program giving defensive access to select technology and cybersecurity firms so they can use the model’s capabilities to find vulnerabilities in their own infrastructure.

Sources

Commentary

This is a watershed moment for AI safety. The fact that general improvements in code reasoning and autonomy accidentally produced a world-class vulnerability discovery engine should terrify anyone building software. Anthropic’s offensive cyber capabilities weren’t designed — they emerged.

Project Glasswing is a clever middle ground: the model’s power gets channeled defensively rather than locked in a vault. But the broader implication is clear — we’re entering an era where shipping the most capable AI models may become a national security decision, not a product launch. Every AI lab will eventually face this same dilemma, and not all of them will make the same choice Anthropic did.

You May Have Missed