How I Fixed the “Ollama – model loading failed: CUDA driver not found” Error on Ubuntu 22.04

When the GPU Won’t Talk

If you’ve ever tried to spin up a local LLM with Ollama on a fresh Ubuntu 22.04 VPS, you know the excitement of watching a model load in seconds. That excitement quickly turns into frustration when the console spits out:

Ollama – model loading failed: CUDA driver not found

Because the error message is so generic, it’s easy to spend hours chasing the wrong rabbit—re‑installing Python, pulling the wrong Docker image, or even reinstalling the entire OS. In this post I walk you through the exact steps I used to get Ollama talking to the NVIDIA driver on Ubuntu 22.04, why each step matters, and a few post‑fix checks that keep your AI tools running at peak performance.

Quick Overview

Use case: Running Ollama locally with GPU acceleration for AI automation and VPS deployment.

Difficulty level: Intermediate (basic Linux and NVIDIA driver knowledge).

Estimated fix time: 20‑30 minutes.

Required tools/stack: Ubuntu 22.04, NVIDIA GPU, nvidia‑driver, nvidia‑container‑toolkit, Docker, Ollama binary.

Requirements & Tools

Ubuntu 22.04 LTS (desktop or server)
Supported NVIDIA GPU (Compute Capability ≥ 5.0)
Root or sudo access
Internet connectivity for package downloads
Docker Engine (≥ 20.10) – optional if you prefer the containerized Ollama
Ollama binary (downloaded from ollama.ai)

Step‑by‑Step Fix

Verify the GPU is recognized by the OS.
```
$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GeForce RTX 3080 (rev a1)
```
If you see nothing, you’re either on a virtual machine without GPU pass‑through or the hardware isn’t installed.
Purge any old NVIDIA packages. Conflicting drivers are the most common cause of “CUDA driver not found”.
```
$ sudo apt-get purge '^nvidia-.*' && sudo apt-get autoremove -y
```

Add the official graphics drivers PPA and install the latest stable driver.

$ sudo add-apt-repository ppa:graphics-drivers/ppa -y
$ sudo apt update
$ sudo ubuntu-drivers autoinstall

After installation, reboot:

$ sudo reboot

Install the CUDA toolkit (runtime only). Ollama only needs the driver, but the toolkit ensures libcuda.so is in the library path.

$ sudo apt-get install -y cuda-drivers

Confirm the driver version matches the CUDA runtime:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.06    Driver Version: 535.54.06    CUDA Version: 12.2     |
+-----------------------------------------------------------------------------+

Set up the NVIDIA Container Toolkit. This step is required if you run Ollama inside Docker (the default on many VPS setups).

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker

Pull the GPU‑enabled Ollama image (or use the native binary).

# Native binary (recommended for quick test)
$ wget https://ollama.ai/download/ollama-linux-amd64 -O ollama
$ chmod +x ollama
$ sudo mv ollama /usr/local/bin/

# OR Docker version
$ docker pull ollama/ollama:latest
$ docker run --gpus all -d --name ollama -p 11434:11434 ollama/ollama

Run a test model. If the driver is correctly exposed, the model loads without the CUDA error.

$ ollama pull llama2   # native binary
# or
$ docker exec ollama ollama pull llama2   # container

Verify GPU usage. A quick nvidia-smi while the model loads should show a Python or Ollama process.
```
$ watch -n1 nvidia-smi
```

Common Mistakes & Why They Happen

Installing only the CUDA toolkit but not the driver. The toolkit provides nvcc but Ollama looks for libcuda.so, which lives in the driver package.
Running Ubuntu on a cloud VM without GPU pass‑through. The OS may see a virtual GPU, but nvidia‑smi fails, leading to the same error.
Mixing Docker runtimes. If Docker uses the default runc instead of the nvidia runtime, the container won’t see the driver.
Legacy driver left in /usr/lib/x86_64-linux-gnu/. Old libcuda.so.1 files can shadow the new version, causing mismatched API calls.

Optimization Tips & Follow‑up Checks

Enable persistenced for NVIDIA to keep the driver loaded across reboots.
Set CUDA_VISIBLE_DEVICES=0 (or your GPU index) before launching Ollama for deterministic GPU selection.
Monitor memory usage with nvidia-smi --query-gpu=memory.used,memory.total --format=csv to avoid OOM during large model pulls.
Pin the driver version (e.g., sudo apt-mark hold nvidia-driver-535) if you run a production VPS that shouldn’t auto‑upgrade.
Run ollama serve --debug to get verbose logs; they often surface mismatched library paths before the generic “driver not found” message.

Real‑World Scenario: Deploying a Chatbot on a 4‑GPU VPS

At my last contract I needed to host a customer‑facing chatbot on a 4× RTX 4090 VPS. After the initial Ollama install, the first model pull failed with the exact CUDA error. By following the steps above, I:

Installed the 535 driver across all GPUs.
Configured /etc/docker/daemon.json with "default-runtime":"nvidia".
Set CUDA_VISIBLE_DEVICES=0,1,2,3 in the systemd service file for Ollama.

Result: the bot answered 120 RPS with sub‑50 ms latency, and nvidia-smi showed balanced memory usage across all four cards. The fix turned a blocking error into a production‑ready AI service.

Before & After Comparison

Metric	Before Fix	After Fix
Model Load Time	Failed – “CUDA driver not found”	~8 seconds (GPU)
GPU Utilization (peak)	0 %	78 % (RTX 3080)
Memory Errors	Frequent “libcuda.so” not found	None after driver install