llama-cpp-python ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9

ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9

Open sequeirawilson2021 opened this issue 3 months ago • 2 comments

System Information:

OS: Windows
GPU: NVIDIA GeForce RTX 5060 Ti
NVIDIA Driver Version: 577.00
CUDA Version (from nvidia-smi): 12.9
Python Version: 3.12
Visual Studio: Visual Studio 2019 with "Desktop development with C++" workload

Problem Description: I am unable to get llama-cpp-python to use my GPU. When I run a script to load a model with n_gpu_layers=-1, I get the error ggml_cuda_init: failed to initialize CUDA: (null), and all layers are loaded on the CPU.

Troubleshooting Steps Taken:

Installed llama-cpp-python using the following command in the "x64 Native Tools Command Prompt for VS 2019" with a Python virtual environment activated: 1 set CMAKE_ARGS="-DGGML_CUDA=on" && pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
Verified that the command completes successfully, but the resulting installation does not use the GPU.
Tried using the deprecated LLAMA_CUBLAS flag, which resulted in a build error (as expected).
Performed a full cleanup of the environment:
- pip uninstall llama-cpp-python
- pip cache purge
- Manually deleted leftover ~* directories from site-packages.
Reinstalled after the cleanup, but the problem persists.
Installed PyTorch with CUDA 12.1 support (pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121) before reinstalling llama-cpp-python, but this did not resolve the issue.
Confirmed that the correct Python interpreter and virtual environment are being used.
The run_with_llama_cpp.py script being used is:

1     from llama_cpp import Llama
2 
3     llm = Llama(
4       model_path="models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
5       n_gpu_layers=-1,
6       n_ctx=4096,
7       verbose=True
8     )
9

10 output = llm( 11 "AI is going to ", 12 max_tokens=32, 13 stop=["."], 14 echo=True 15 ) 16 17 print(output)

Request: Could you please provide any insights into why the CUDA initialization might be failing, or suggest any further diagnostic steps? I can provide the full verbose build log if needed.

Aug 31 '25 12:08 sequeirawilson2021

llama-cpp-python llama-cpp-python copied to clipboard

ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9

llama-cpp-python
llama-cpp-python copied to clipboard