Ollama on Ubuntu 24.04 with NVIDIA: Clean Install + Driver Pitfalls
The promise sounds simple: run a single curl command and deploy local LLMs with GPU acceleration. The reality? Driver conflicts, CUDA mismatches, and GPU detection failures that can derail your AI infrastructure deployment. This comprehensive guide navigates the gap between marketing simplicity and production reliability.
Bottom Line Up Front
Ollama's one-command installation works flawlessly—on the software side. GPU acceleration depends entirely on having correctly configured NVIDIA drivers before Ollama installation. The command curl -fsSL https://ollama.com/install.sh | sh
installs Ollama in seconds, but your NVIDIA driver configuration determines whether you achieve 10x GPU-accelerated inference or fall back to sluggish CPU processing.
Understanding the Ollama Architecture on Ubuntu
Ollama revolutionizes local LLM deployment by providing a streamlined interface for running models like Llama 2, Code Llama, Mixtral, and Gemma on your own hardware. Unlike cloud-based AI services that send your prompts and data to external servers, Ollama processes everything locally—delivering complete data privacy and eliminating ongoing API costs.
The architecture consists of three critical layers that must work in harmony for optimal performance. At the foundation sits your NVIDIA GPU driver stack, which bridges hardware capabilities to software interfaces. The middle layer consists of the CUDA runtime libraries that Ollama bundles directly—one of Ollama's key advantages is that it includes its own CUDA runtime, requiring only a compatible NVIDIA driver on the host system without needing a separate CUDA Toolkit installation. The top layer runs Ollama itself, which automatically detects and utilizes available GPU resources when properly configured.
What makes Ollama particularly attractive for enterprise deployments is its zero-configuration GPU detection—when drivers are correctly installed. Ollama automatically detects NVIDIA GPUs when properly configured, eliminating the need for manual CUDA path configuration or separate CUDA Toolkit installations that complicate other frameworks. However, this automation becomes a liability when driver issues exist, as Ollama silently falls back to CPU mode without obvious error messages.
Performance Impact: GPU vs CPU Inference
The performance differential between GPU and CPU inference dramatically impacts user experience and infrastructure costs:
- GPU-Accelerated Inference: 30-80 tokens/second for 7B models on RTX 3060-class hardware, enabling real-time interactive experiences
- CPU-Only Inference: 3-8 tokens/second for the same models, creating frustrating multi-minute wait times for complex queries
This 5-10x performance gap transforms Ollama from a barely-usable curiosity into a production-viable AI infrastructure component. For businesses deploying local AI for customer service, code generation, or document analysis, GPU acceleration isn't optional—it's essential for acceptable response times.
Prerequisites and System Requirements
Successful Ollama deployment with GPU acceleration requires careful attention to hardware specifications and system configuration. Underprovisioned systems create frustrating performance bottlenecks that negate the benefits of local LLM deployment.
NVIDIA GPU Requirements
Not all NVIDIA GPUs support Ollama's CUDA-based acceleration. Ollama requires NVIDIA GPUs with CUDA Compute Capability 5.0 or higher, which includes:
Recommended GPU Series (Excellent Support)
- • NVIDIA RTX 40-series (4090, 4080, 4070 Ti, 4060 Ti)
- • NVIDIA RTX 30-series (3090, 3080, 3070, 3060)
- • NVIDIA RTX 20-series (2080 Ti, 2070, 2060)
- • NVIDIA Tesla/Data Center GPUs (A100, A40, L40S, L4)
Older Supported GPUs (Limited Performance)
- • NVIDIA GTX 16-series (1660 Ti, 1650)
- • NVIDIA GTX 10-series (1080 Ti, 1070, 1060 6GB)
- • NVIDIA Tesla P100 (data center)
- • Minimum 6GB VRAM for practical usage
Verify your GPU's compute capability by checking your GPU model against NVIDIA's official CUDA GPUs list. You can identify your GPU on Ubuntu with:
lspci | grep -i nvidia
Example output: 01:00.0 VGA compatible controller: NVIDIA Corporation TU117 [GeForce GTX 1650]
Memory and Storage Requirements
Minimum requirements are 4GB RAM and 10GB disk space, but this only supports tiny models. Recommended specs are 16GB RAM, 50GB disk space, and an NVIDIA GPU with 8GB+ VRAM. For production deployments running multiple concurrent models, consider:
VRAM Requirements by Model Size:
- 7B parameter models (Llama 2, Mistral): 6-8GB VRAM
- 13B parameter models (Llama 2 13B, Vicuna 13B): 12-16GB VRAM
- 34B parameter models (Code Llama 34B): 20-24GB VRAM
- 70B parameter models (Llama 2 70B): 48GB+ VRAM (multi-GPU or CPU)
System RAM Recommendations:
- • 16GB RAM: Single 7B model with comfortable headroom
- • 32GB RAM: Multiple models or 13B models
- • 64GB+ RAM: Enterprise deployments with model switching
Storage considerations extend beyond model files. Each LLM download consumes 4-40GB depending on parameter count and quantization. These models are not small. Make sure you have at least 30-50GB of free space on a fast drive (like an SSD) to be comfortable. Some models alone can take up over 20GB. Plan for 100GB+ storage on NVMe SSDs for production systems managing multiple model variants.
The Critical First Step: NVIDIA Driver Installation
This is where Ollama deployments succeed or fail. The temptation to skip directly to Ollama installation is strong, but without properly configured NVIDIA drivers, your expensive GPU becomes an idle spectator while your CPU struggles with inference workloads.
Critical Warning: Driver Installation Order
Installing Ollama before NVIDIA drivers doesn't prevent GPU detection—Ollama will detect GPUs when drivers are later installed. However, this approach creates debugging complexity and risks environment variable conflicts.
Always install and verify NVIDIA drivers before installing Ollama. This workflow eliminates ambiguity about whether GPU detection failures stem from driver issues or Ollama configuration problems.
GPU Memory Management
Ollama automatically manages GPU memory allocation, but custom limits prevent out-of-memory errors in shared GPU environments or when running multiple applications.
Set GPU memory fraction limit:
sudo systemctl edit ollama
Add memory fraction configuration:
[Service]
Environment="OLLAMA_GPU_MEMORY_FRACTION=0.8"
This limits Ollama to 80% of available VRAM, reserving 20% for other applications or system overhead. Recommended values:
- • Dedicated AI workstation: 0.9 (90% allocation)
- • Shared development machine: 0.7 (70% allocation)
- • Multi-tenant server: 0.5 (50% allocation)
For models exceeding available VRAM, Ollama automatically offloads layers to CPU. Configure layer distribution manually for optimal hybrid performance.
Model Keep-Alive Configuration
Ollama unloads models from VRAM after inactivity to free resources. The default keep-alive duration is 5 minutes. You can adjust this system-wide by configuring the OLLAMA_KEEP_ALIVE environment variable in the systemd service:
sudo systemctl edit ollama
Add the keep-alive configuration:
[Service]
Environment="OLLAMA_KEEP_ALIVE=60m"
Restart Ollama to apply changes:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Common keep-alive settings:
- •
OLLAMA_KEEP_ALIVE=60m
- Keep model loaded for 1 hour - •
OLLAMA_KEEP_ALIVE=-1
- Keep model loaded indefinitely - •
OLLAMA_KEEP_ALIVE=0
- Unload immediately after inference
Choose settings based on your usage patterns—longer keep-alive reduces model loading latency for frequent queries but consumes VRAM continuously.
Integrating Open WebUI for Production Workflows
While Ollama's CLI interface works well for testing, production deployments benefit from web-based interfaces that support team collaboration, chat history, and model management. Open WebUI provides a modern, ChatGPT-like interface for Ollama.
Docker-Based Open WebUI Deployment
Open WebUI runs as a containerized application, simplifying deployment and isolating dependencies. First, ensure Docker is installed:
# Install Docker if not present
sudo apt install docker.io -y
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
newgrp docker
Deploy Open WebUI container:
docker run -d \
--network=host \
-v open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Parameters explained:
- •
--network=host
: Simplifies Ollama connectivity by sharing host networking - •
-v open-webui:/app/backend/data
: Persists chat history and settings - •
-e OLLAMA_BASE_URL
: Configures Ollama API endpoint - •
--restart always
: Ensures automatic startup after reboots
Access Open WebUI at http://your-server-ip:8080
. First login automatically creates an admin account—secure this immediately in production deployments.
Verifying End-to-End Functionality
After Open WebUI deployment, perform comprehensive testing to validate the entire stack:
- 1. Access Open WebUI and create admin account
- 2. Select a model from the dropdown (should show models pulled via CLI)
-
3.
Send a test prompt and monitor
nvidia-smi
for GPU activity - 4. Verify response quality and generation speed match CLI performance
Connection issues usually stem from incorrect OLLAMA_BASE_URL settings or Docker networking problems. Verify Ollama is running with systemctl status ollama
and test the API manually:
curl http://127.0.0.1:11434/api/tags
This should return JSON listing available models. No response indicates Ollama service issues or firewall blocking.
Performance Benchmarking and Validation
Quantifying GPU acceleration benefits validates your installation success and helps capacity planning for model selection and concurrent user support.
Simple Performance Comparison
Benchmark GPU vs CPU inference with identical prompts:
# GPU-accelerated inference (default)
time ollama run gemma:2b "Write a Python function to calculate fibonacci numbers"
# Force CPU-only inference
OLLAMA_NO_GPU=1 time ollama run gemma:2b "Write a Python function to calculate fibonacci numbers"
Expected performance characteristics on RTX 3060-class hardware:
GPU-Accelerated (RTX 3060 12GB):
- • Generation speed: 45-60 tokens/second
- • Total time for 200-token response: 3-5 seconds
- • GPU utilization: 80-100% during generation
- • Power draw: 120-150W
CPU-Only (AMD Ryzen 9 3900X):
- • Generation speed: 5-8 tokens/second
- • Total time for 200-token response: 25-40 seconds
- • CPU utilization: 100% across all cores
- • System responsiveness: Significantly degraded
The 6-10x performance improvement justifies GPU infrastructure investment for any serious local LLM deployment.
Monitoring Long-Term Performance
Implement continuous monitoring to detect performance degradation from driver updates, thermal throttling, or resource contention:
# Real-time GPU monitoring with 1-second refresh
watch -n 1 nvidia-smi
# Log GPU utilization to file for analysis
nvidia-smi --query-gpu=timestamp,utilization.gpu,utilization.memory,memory.used,temperature.gpu --format=csv --loop=10 > gpu_metrics.csv
Review logs periodically to identify thermal throttling (temperatures exceeding 80°C) or memory pressure indicating need for model optimization or hardware upgrades.
Production Deployment Checklist
Before deploying Ollama to production environments, validate these critical configuration points to ensure reliability, security, and performance.
Infrastructure Validation
-
NVIDIA drivers verified:
nvidia-smi
shows correct driver version and GPU detection - Ollama GPU detection confirmed: Test model shows GPU utilization during inference
- Performance benchmarked: Documented baseline performance for capacity planning
- Thermal monitoring configured: Alerts set for temperature thresholds
- Storage capacity planned: Adequate space allocated for model library growth
Security Hardening
- Firewall configured: Ollama port 11434 restricted to authorized networks only
- Reverse proxy deployed: HTTPS with authentication for external access
- Service isolation verified: Ollama runs under dedicated non-privileged user
- Update strategy documented: Procedure for testing driver updates before production deployment
- Backup procedures established: Model library and configuration backed up regularly
Operational Readiness
- Service auto-start configured: Ollama enabled in systemd for boot persistence
- Monitoring dashboards deployed: GPU metrics, service health visible to operations team
- Runbook documented: Troubleshooting procedures for common failure scenarios
- User training completed: Team familiar with model selection and prompt optimization
- Disaster recovery tested: Validated ability to restore service from backups
ITECS Managed AI Infrastructure Services
Deploying and maintaining local LLM infrastructure requires specialized expertise across GPU computing, driver management, system optimization, and operational monitoring. For enterprises seeking production-ready AI capabilities without dedicating internal IT resources to infrastructure complexity, ITECS provides comprehensive managed AI infrastructure services.
Enterprise Ollama Deployment Services
Our MSP ELITE package now includes professional Ollama infrastructure deployment and management, delivering turnkey AI capabilities backed by 24/7 expert support. ITECS eliminates the trial-and-error of driver configuration and provides enterprise-grade reliability from day one.
Managed AI Infrastructure Includes:
- • Hardware Assessment and Procurement Guidance: Right-sized GPU selection for your model requirements and budget constraints
- • Ubuntu 24.04 LTS Optimization: Custom-tuned OS configuration for AI workload performance
- • NVIDIA Driver Management: Proactive driver testing and updates without production disruption
- • High Availability Configuration: Redundant infrastructure with automatic failover capabilities
- • Security Hardening: Network segmentation, authentication, and compliance-ready access controls
- • Performance Monitoring: Real-time dashboards tracking GPU utilization, model performance, and capacity planning metrics
- • Model Management: Curated model library with version control and rollback capabilities
- • Integration Services: API connectivity to existing business applications and workflows
Why Choose ITECS for AI Infrastructure
Building internal expertise for GPU-accelerated AI infrastructure diverts resources from core business objectives. ITECS brings proven methodologies developed across hundreds of enterprise deployments, eliminating the costly learning curve that derails internal projects.
Internal Deployment Challenges
- • 2-4 weeks learning curve for driver ecosystem
- • Trial-and-error hardware procurement decisions
- • Unpredictable troubleshooting time for failures
- • Knowledge concentration in single individuals
- • Reactive response to performance degradation
- • Security gaps from configuration oversights
ITECS Managed Infrastructure
- • Production-ready deployment in 3-5 business days
- • Pre-validated hardware configurations
- • 24/7 expert support with <2 hour response SLA
- • Deep bench strength across multiple engineers
- • Proactive monitoring with predictive alerts
- • Security best practices applied by default
Conclusion: Bridging Marketing Promises and Technical Reality
Ollama's one-command installation delivers on its promise—but only when prerequisite infrastructure is correctly configured. The single curl command truly does install Ollama in seconds. GPU acceleration, however, depends entirely on NVIDIA driver configuration that requires meticulous attention to Ubuntu 24.04 specifics, Secure Boot implications, and version compatibility nuances.
The performance differential between successful and failed GPU integration isn't marginal—it's transformative. CPU-only inference relegates Ollama to a curiosity unsuitable for interactive use. GPU-accelerated inference enables production deployment with response times matching cloud-based AI services while maintaining complete data privacy and eliminating recurring API costs.
Organizations evaluating local LLM deployment must budget not just for hardware acquisition, but for the expertise required to navigate driver ecosystems, performance optimization, and operational monitoring. The technical complexity isn't insurmountable, but it requires systematic attention to detail and ongoing maintenance that distracts from core business objectives.
Ready for Production-Grade Local AI?
ITECS transforms the complexity of Ollama deployment into turnkey AI infrastructure backed by enterprise-grade support. We handle driver management, performance optimization, security hardening, and 24/7 monitoring so your team can focus on leveraging AI capabilities rather than maintaining GPU infrastructure.
Whether deploying your first local LLM or scaling existing AI infrastructure, our MSP ELITE package delivers the expertise and reliability your business demands. Stop wrestling with driver conflicts and start deploying production AI applications.
The gap between Ollama's marketing simplicity and deployment reality is navigable with the right knowledge and preparation. This guide provides the technical foundation for successful GPU-accelerated LLM deployment on Ubuntu 24.04. For organizations prioritizing speed to production and operational reliability over internal capability building, managed services eliminate the learning curve and deliver immediate business value.
The future of enterprise AI increasingly favors local deployment for data sensitivity, cost predictability, and customization flexibility. With proper infrastructure foundation—whether self-managed or professionally deployed—Ollama transforms expensive cloud AI dependencies into owned capabilities that scale without recurring costs.
Related ITECS Resources
The ubuntu-drivers tool is recommended if your computer uses Secure Boot, since it always tries to install signed drivers which are known to work with Secure Boot. This method provides the most reliable path for Ubuntu 24.04 deployments with minimal manual intervention.
Step 1: Update system packages and enable required repositories
sudo apt update && sudo apt upgrade -y
Step 2: Identify available NVIDIA drivers for your hardware
ubuntu-drivers devices
This command outputs your GPU model and lists compatible driver versions with recommendations. Example output:
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
vendor : NVIDIA Corporation
model : TU117 [GeForce GTX 1650]
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-545 - distro non-free
driver : nvidia-driver-550 - distro non-free recommended
Step 3: Install the recommended driver automatically
sudo ubuntu-drivers install
This command automatically selects and installs the recommended driver marked in the previous output. For manual driver version selection:
sudo apt install nvidia-driver-550 -y
Step 4: Reboot to load the new kernel modules
sudo reboot
The reboot is mandatory—NVIDIA kernel modules require a restart to initialize properly. Skipping this step causes cryptic errors later.
Method 2: Graphics Drivers PPA (Latest Drivers)
For users requiring cutting-edge driver features or newer hardware support not yet in Ubuntu's default repositories, the graphics-drivers PPA provides access to the latest tested drivers. The PPA approach has more consistent results when combined with proper installation sequencing.
Step 1: Add the graphics-drivers PPA repository
sudo add-apt-repository ppa:graphics-drivers/ppa --yes
sudo apt update
Step 2: Install the desired driver version
sudo apt install nvidia-driver-560 -y
Check the graphics-drivers PPA Launchpad page to identify the latest tested driver version before installation. Replace "560" with your preferred version.
Step 3: Reinstall kernel headers and update initramfs
sudo apt reinstall linux-headers-$(uname -r) -y
sudo update-initramfs -u
These commands ensure DKMS (Dynamic Kernel Module Support) correctly builds NVIDIA modules for your running kernel version—critical for successful driver operation.
Step 4: Verify DKMS module compilation
dkms status
Successful output shows: nvidia/560: added
or nvidia/560: installed
. If missing, the driver won't function after reboot.
Step 5: Reboot the system
sudo reboot
Critical Verification: The nvidia-smi Test
After reboot, immediately verify driver functionality before proceeding to Ollama installation. This single command reveals whether your driver stack is correctly configured:
nvidia-smi
Successful output displays comprehensive GPU information:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 30% 42C P8 15W / 170W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
Key information to verify:
- Driver Version: Confirms loaded driver matches your installation
- CUDA Version: Shows maximum supported CUDA runtime (Ollama uses bundled CUDA libraries, so exact match isn't required)
- Memory-Usage: Displays total VRAM capacity—critical for model size planning
- GPU-Util: Current GPU utilization percentage (should be 0% when idle)
Troubleshooting: If nvidia-smi
fails with "command not found" or "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver," your driver installation failed. Do not proceed to Ollama installation—diagnose and fix driver issues first.
Additional verification checks:
# Verify kernel module loaded
lsmod | grep nvidia
# Check NVIDIA device files
ls -la /dev/nvidia*
You should see multiple nvidia*
modules loaded and device files in /dev/
. Missing modules or device files indicate incomplete driver initialization.
Installing Ollama: The Easy Part
With NVIDIA drivers correctly configured and verified, Ollama installation becomes genuinely straightforward. The single-command installation isn't marketing hype—it's an accurate representation of Ollama's streamlined deployment process when prerequisites are met.
One-Command Installation
curl -fsSL https://ollama.com/install.sh | sh
This command performs multiple operations automatically:
- • Downloads the latest Ollama binary optimized for Linux AMD64 architecture
-
•
Installs the binary to
/usr/local/bin/ollama
with executable permissions -
•
Creates dedicated
ollama
system user and group for service isolation - • Installs and enables systemd service for automatic startup
-
•
Configures model storage directory at
/usr/share/ollama/.ollama/models
The installation script includes error handling and typically completes in 10-30 seconds on modern systems with good internet connectivity. Installation output confirms successful service creation and startup.
Verifying Ollama Installation and GPU Detection
Immediately after installation, verify Ollama correctly detects your NVIDIA GPU. Ollama shows clear error messages when GPU detection fails, so thorough verification prevents frustrating debugging sessions later.
Step 1: Check Ollama service status
systemctl status ollama
Output should show active (running)
status. If the service failed to start, examine logs with journalctl -xeu ollama
.
Step 2: Test GPU detection with a model pull
ollama pull gemma:2b
This downloads Gemma 2B, a compact model perfect for verification testing. The 2B variant requires only ~1.7GB VRAM, ensuring compatibility with entry-level GPUs while providing meaningful performance comparison between CPU and GPU inference.
Step 3: Run inference and monitor GPU utilization
Open a second terminal and start real-time GPU monitoring:
watch -n 1 nvidia-smi
In your primary terminal, run an inference test:
ollama run gemma:2b "Explain quantum computing in simple terms"
While the model generates output, observe your monitoring terminal. Successful GPU acceleration shows:
- Memory-Usage increase: VRAM allocation grows by 2-3GB as the model loads into GPU memory
- GPU-Util spikes: Utilization reaches 60-100% during active inference
-
Processes section lists:
/usr/local/bin/ollama
as an active GPU process - Rapid token generation: Text appears quickly, indicating GPU-accelerated performance
GPU Detection Failure Symptoms:
- •
nvidia-smi
shows 0% GPU utilization during inference - • Memory-Usage remains at 0MiB throughout model execution
- • Token generation is extremely slow (3-8 tokens/second)
- • Ollama logs show "no compatible GPUs were discovered"
If experiencing these symptoms, do not proceed—diagnose GPU detection issues immediately using the troubleshooting section below.
Common Driver Pitfalls and Solutions
Even following correct installation procedures, specific driver-related issues plague Ubuntu 24.04 Ollama deployments. Understanding these pitfalls and their solutions prevents hours of frustrating troubleshooting.
Pitfall #1: Nouveau Driver Conflicts
Ubuntu ships with the open-source Nouveau driver for NVIDIA GPUs, which conflicts with proprietary NVIDIA drivers. The open-source NVIDIA kernel driver nouveau can conflict with the proprietary NVIDIA driver when both are loaded. While ubuntu-drivers typically handles this automatically, manual installations or upgrade scenarios sometimes leave Nouveau active.
Detection: Check for Nouveau module loading
lsmod | grep nouveau
Any output indicates Nouveau is loaded. Proceed with blacklisting.
Solution: Blacklist Nouveau driver permanently
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo update-initramfs -u
sudo reboot
After reboot, verify Nouveau is no longer loaded and NVIDIA driver functions correctly.
Pitfall #2: Driver Version Mismatches
API mismatch errors occur when the userspace driver packages were upgraded while the kernel module is still on the older version. This situation often occurs after a system upgrade. Ollama may refuse to detect GPUs when driver component versions don't align.
Detection: Check for version mismatches
# Check userspace driver version
nvidia-smi
# Check kernel module version
cat /proc/driver/nvidia/version
Mismatched version numbers indicate the problem. Kernel logs may show: NVRM: API mismatch: the client has the version 570.172.08, but this kernel module has the version 570.158.01
Solution: Synchronize driver versions through reboot
sudo reboot
Simple reboot typically resolves version mismatches by loading updated kernel modules. If issues persist, reinstall the driver package:
sudo apt install --reinstall nvidia-driver-550
Pitfall #3: Secure Boot Complications
Modern Ubuntu systems enable Secure Boot by default, requiring kernel module signatures for loading. Unsigned NVIDIA modules trigger boot failures or prevent driver initialization. Ubuntu's ubuntu-drivers tool automatically handles Secure Boot by installing signed drivers from repositories, but manual installations or custom driver versions cause problems.
Detection: Check Secure Boot status
mokutil --sb-state
Output showing SecureBoot enabled
requires signed modules.
Solution Option 1: Use signed drivers from Ubuntu repositories (recommended)
Install drivers via ubuntu-drivers or official Ubuntu packages, which include proper Secure Boot signatures.
Solution Option 2: Disable Secure Boot (less secure)
Access UEFI/BIOS settings during boot (typically F2, F10, or Del key) and disable Secure Boot. This enables unsigned module loading but reduces system security posture—not recommended for production deployments.
Pitfall #4: Suspend/Resume GPU Loss
On Linux, after a suspend/resume cycle, sometimes Ollama will fail to discover your NVIDIA GPU and fallback to running on the CPU due to a driver bug. This affects laptops and workstations using system suspend, causing mysterious performance degradation after resume.
Detection: GPU detected at boot but not after suspend/resume cycle
Solution: Reload NVIDIA UVM driver
sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm
For persistent solution, create a systemd service to reload UVM after resume:
sudo nano /etc/systemd/system/nvidia-resume.service
Add configuration:
[Unit]
Description=Reload NVIDIA UVM after resume
After=suspend.target hibernate.target hybrid-sleep.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c '/sbin/rmmod nvidia_uvm; /sbin/modprobe nvidia_uvm'
[Install]
WantedBy=suspend.target hibernate.target hybrid-sleep.target
Enable the service:
sudo systemctl enable nvidia-resume.service
Pitfall #5: Specific Driver Version Incompatibilities
After updating to NVIDIA driver 555.85, Ollama can no longer use the GPU. The issue stems from incompatibility between driver 555.85 and Ollama. Downgrading the driver to version 552.44 resolves the problem. Not all driver versions maintain equal compatibility with Ollama's bundled CUDA libraries.
Detection: Ollama suddenly stops detecting GPU after driver update, despite nvidia-smi functioning correctly
Solution: Downgrade to known-good driver version
# Remove problematic driver
sudo apt remove --purge nvidia-driver-555
sudo apt autoremove
# Install known-compatible version
sudo apt install nvidia-driver-550
sudo reboot
Hold the driver package to prevent automatic upgrades:
sudo apt-mark hold nvidia-driver-550
Monitor Ollama GitHub issues and release notes before upgrading drivers in production environments.
Ollama Configuration and Optimization
Beyond basic installation, several configuration adjustments optimize Ollama for enterprise deployment scenarios and specific hardware configurations.
Configuring Network Access
By default, Ollama binds to localhost only. To make it reachable from other machines or containers, you must configure the OLLAMA_HOST environment variable. This enables web UI integrations, API access from other machines, and containerized client connections.
Edit the systemd service configuration:
sudo systemctl edit ollama
Add override configuration:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Restart Ollama to apply changes:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Security Warning: Binding to 0.0.0.0 exposes Ollama to all network interfaces. Configure firewall rules appropriately and consider implementing reverse proxy authentication for production deployments.
Multi-GPU Configuration
Systems with multiple NVIDIA GPUs require explicit configuration to control which GPUs Ollama utilizes. Set CUDA_VISIBLE_DEVICES to a comma-separated list of GPUs. UUIDs are more reliable than numeric IDs for consistent GPU identification.
Identify GPU UUIDs:
nvidia-smi -L
Example output:
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-abc12345-6789-0def-ghij-klmn01234567)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-xyz98765-4321-0fed-jihg-mnlk76543210)
Configure Ollama to use specific GPUs:
sudo systemctl edit ollama
Add environment variable:
[Service]
Environment="CUDA_VISIBLE_DEVICES=GPU-abc12345-6789-0def-ghij-klmn01234567"
For multiple GPUs, use comma-separated UUID list. Restart Ollama after configuration changes.