“Bleeding Llama” — Critical Ollama Vulnerability Exposes 300,000+ AI Servers to Remote Memory Leak

Summary

A critical out-of-bounds read vulnerability (CVE-2026-7482, CVSS 9.1) has been disclosed in Ollama, the wildly popular open-source framework for running large language models locally. Dubbed “Bleeding Llama” by researchers at Cyera, the flaw allows a remote, unauthenticated attacker to leak the entire process memory of an Ollama server — potentially exposing API keys, environment variables, system prompts, and even other users’ conversation data.

The vulnerability exists in Ollama’s GGUF model loader. By uploading a specially crafted GGUF file with an inflated tensor shape to the /api/create endpoint, an attacker can trigger a heap out-of-bounds read during quantization. The leaked memory can then be exfiltrated by pushing the resulting model artifact to an attacker-controlled registry via /api/push. Ollama versions prior to 0.17.1 are affected, and with over 300,000 Ollama instances reachable on the internet, the attack surface is massive.

The root cause is Ollama’s use of Go’s unsafe package when processing GGUF files, bypassing the language’s memory safety guarantees. Ollama has patched the issue in version 0.17.1.

Sources

Commentary

This is exactly the kind of vulnerability that was inevitable as AI infrastructure went from “cool local experiment” to “production deployment on 300K+ servers.” Ollama’s convenience — spin up an LLM in seconds — came with the tradeoff of minimal security hardening. Using Go’s unsafe package to handle untrusted model files is a textbook recipe for memory corruption bugs.

The real concern isn’t just the vulnerability itself — it’s the attack surface. Most of those 300,000 servers are likely running without authentication, exposing not just the model but every secret in process memory. If you’re running Ollama publicly, patch to 0.17.1 immediately and seriously reconsider whether it should be internet-facing at all.

You May Have Missed