Ollama on Ubuntu 22.04 keeps crashing with “CUDA out of memory” after vLLM 0.5 upgrade – step‑by‑step fix for the GPU memory leak in Docker

You’ve upgraded vLLM to 0.5, and now your Ollama setup on Ubuntu 22.04 is crashing hard. The error message stares back at you: “CUDA out of memory.” Your Docker container was running smoothly yesterday. Today? It’s a memory leak nightmare. You’re not alone—this is a known issue affecting developers deploying large language models (LLMs) in … Read more