When the GPU Won’t Talk
If you’ve ever tried to spin up a local LLM with Ollama on a fresh Ubuntu 22.04 VPS, you know the excitement of watching a model load in seconds. That excitement quickly turns into frustration when the console spits out:
Ollama – model loading failed: CUDA driver not found
Because the error message is so generic, it’s easy to spend hours chasing the wrong rabbit—re‑installing Python, pulling the wrong Docker image, or even reinstalling the entire OS. In this post I walk you through the exact steps I used to get Ollama talking to the NVIDIA driver on Ubuntu 22.04, why each step matters, and a few post‑fix checks that keep your AI tools running at peak performance.
Quick Overview
Use case: Running Ollama locally with GPU acceleration for AI automation and VPS deployment.
Difficulty level: Intermediate (basic Linux and NVIDIA driver knowledge).
Estimated fix time: 20‑30 minutes.
Required tools/stack: Ubuntu 22.04, NVIDIA GPU, nvidia‑driver, nvidia‑container‑toolkit, Docker, Ollama binary.
Requirements & Tools
- Ubuntu 22.04 LTS (desktop or server)
- Supported NVIDIA GPU (Compute Capability ≥ 5.0)
- Root or sudo access
- Internet connectivity for package downloads
- Docker Engine (≥ 20.10) – optional if you prefer the containerized Ollama
- Ollama binary (downloaded from ollama.ai)
Step‑by‑Step Fix
- Verify the GPU is recognized by the OS.
$ lspci | grep -i nvidia 01:00.0 VGA compatible controller: NVIDIA Corporation GeForce RTX 3080 (rev a1)
If you see nothing, you’re either on a virtual machine without GPU pass‑through or the hardware isn’t installed.
- Purge any old NVIDIA packages. Conflicting drivers are the most common cause of “CUDA driver not found”.
$ sudo apt-get purge '^nvidia-.*' && sudo apt-get autoremove -y
- Add the official graphics drivers PPA and install the latest stable driver.
$ sudo add-apt-repository ppa:graphics-drivers/ppa -y $ sudo apt update $ sudo ubuntu-drivers autoinstall
After installation, reboot:
$ sudo reboot
- Install the CUDA toolkit (runtime only). Ollama only needs the driver, but the toolkit ensures
libcuda.sois in the library path.$ sudo apt-get install -y cuda-drivers
Confirm the driver version matches the CUDA runtime:
$ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.06 Driver Version: 535.54.06 CUDA Version: 12.2 | +-----------------------------------------------------------------------------+
- Set up the NVIDIA Container Toolkit. This step is required if you run Ollama inside Docker (the default on many VPS setups).
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update $ sudo apt-get install -y nvidia-docker2 $ sudo systemctl restart docker
- Pull the GPU‑enabled Ollama image (or use the native binary).
# Native binary (recommended for quick test) $ wget https://ollama.ai/download/ollama-linux-amd64 -O ollama $ chmod +x ollama $ sudo mv ollama /usr/local/bin/ # OR Docker version $ docker pull ollama/ollama:latest $ docker run --gpus all -d --name ollama -p 11434:11434 ollama/ollama
- Run a test model. If the driver is correctly exposed, the model loads without the CUDA error.
$ ollama pull llama2 # native binary # or $ docker exec ollama ollama pull llama2 # container
- Verify GPU usage. A quick
nvidia-smiwhile the model loads should show a Python or Ollama process.$ watch -n1 nvidia-smi
Common Mistakes & Why They Happen
- Installing only the CUDA toolkit but not the driver. The toolkit provides
nvccbut Ollama looks forlibcuda.so, which lives in the driver package. - Running Ubuntu on a cloud VM without GPU pass‑through. The OS may see a virtual GPU, but
nvidia‑smifails, leading to the same error. - Mixing Docker runtimes. If Docker uses the default
runcinstead of thenvidiaruntime, the container won’t see the driver. - Legacy driver left in
/usr/lib/x86_64-linux-gnu/. Oldlibcuda.so.1files can shadow the new version, causing mismatched API calls.
Optimization Tips & Follow‑up Checks
- Enable
persistencedfor NVIDIA to keep the driver loaded across reboots. - Set
CUDA_VISIBLE_DEVICES=0(or your GPU index) before launching Ollama for deterministic GPU selection. - Monitor memory usage with
nvidia-smi --query-gpu=memory.used,memory.total --format=csvto avoid OOM during large model pulls. - Pin the driver version (e.g.,
sudo apt-mark hold nvidia-driver-535) if you run a production VPS that shouldn’t auto‑upgrade. - Run
ollama serve --debugto get verbose logs; they often surface mismatched library paths before the generic “driver not found” message.
Real‑World Scenario: Deploying a Chatbot on a 4‑GPU VPS
At my last contract I needed to host a customer‑facing chatbot on a 4× RTX 4090 VPS. After the initial Ollama install, the first model pull failed with the exact CUDA error. By following the steps above, I:
- Installed the 535 driver across all GPUs.
- Configured
/etc/docker/daemon.jsonwith"default-runtime":"nvidia". - Set
CUDA_VISIBLE_DEVICES=0,1,2,3in the systemd service file for Ollama.
Result: the bot answered 120 RPS with sub‑50 ms latency, and nvidia-smi showed balanced memory usage across all four cards. The fix turned a blocking error into a production‑ready AI service.
Before & After Comparison
| Metric | Before Fix | After Fix |
|---|---|---|
| Model Load Time | Failed – “CUDA driver not found” | ~8 seconds (GPU) |
| GPU Utilization (peak) | 0 % | 78 % (RTX 3080) |
| Memory Errors | Frequent “libcuda.so” not found | None after driver install |