llama-cpp-python CUDA llama-cpp-python build failed.

I am trying to build llama-cpp-python with CUDA and it is failing. I have tried to use some of the suggestions in here for similar issues and they aren't working and I don't see what is wrong in the output. My system is 21.3 Linux Mint. My CUDA gfx driver version is 12.5. My CUDA toolkit version is 11.5.

This is the command I used for the installation.

python3 -m venv venv
source venv/bin/activate
CMAKE_ARGS="-DGGML_CUDA=on" LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu"  pip install llama-cpp-python==0.3.4 --verbose

The output is here

https://gist.github.com/Ado012/70492020c3567bc60f6daa2ba89e2be5

Mar 27 '25 23:03 Ado012

are you sure you got the nvcc configurations right? with you can check which nvcc or nvcc --version. if there is no output :

you can add the following to your .bashrc file. (replace cuda-12.6 with your version, check with nvidia-smi):

 export PATH="/usr/local/cuda-12.6/bin:$PATH"
 export LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH"

then try reinstalling with the same installation command

Mar 28 '25 12:03 ikbalunal

are you sure you got the nvcc configurations right? with you can check which nvcc or nvcc --version. if there is no output :

you can add the following to your .bashrc file. (replace cuda-12.6 with your version, check with nvidia-smi):
 export PATH="/usr/local/cuda-12.6/bin:$PATH"
 export LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH"
then try reinstalling with the same installation command

So I did that and the build completes but for some reason it seems to be fetching and building a cpu version and not a cuda version no matter what I try. The gpu apparently remains unutilized when I run this version.

Here is the command I used.

set CMAKE_ARGS="-DGGML_CUDA=on" && set FORCE_CMAKE=1 && pip install llama-cpp-python==0.2.90 --verbose --force-reinstall --no-cache

Here is the output of the build

https://gist.github.com/Ado012/8bb05396886b1d69eead03ffe9dd90e3

Here is the output of nvcc --version


nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Here is the entry in .bashrc (along with an earlier addtion I had made

#added for bitsandbytes and aivoicecloning 052923
LD_LIBRARY_PATH='/usr/lib/x86_64-linux-gnu/'
export LD_LIBRARY_PATH


#added for llama-cpp-python 032825 does this overwrite above?
export PATH="/usr/local/cuda-12.5/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH"

Here is run output https://gist.github.com/Ado012/3c08b61df8dd35a6813329a4b4d3cf33

Additionally I tried building llama-cpp-python in Google Colab (CUDA 12.5) to see whether the issue was a lower CUDA version like some suggested. This didn't work either although maybe for different reasons.

Colab build command !set CMAKE_ARGS="-DGGML_CUDA=on" && set FORCE_CMAKE=1 && venv/bin/pip install llama-cpp-python[server]==0.2.90 --verbose --force-reinstall --no-cache-dir

Colab build output https://gist.github.com/Ado012/3f98a1a2bdfda9aa25d993345b760ec9

Colab run output https://gist.github.com/Ado012/8fd3ceef8d2f9d47b874e97a9a401dc0

Again the GPU is not being used.

PS: I can't use a prebuilt CUDA wheel because my CUDA version isn't the right one for one of them plus I want to maintain compatibility with as wide a range of systems as possible.

Mar 28 '25 22:03 Ado012

Trying to do about same with 12.6 and just simple no AVX anything (cause my CPU just cant). Ive managed to get only 22mb something that for sure is just CPU version.

In the past I built 0.3.2 with 12.6, pretty much by accident I think.. unfortunately dont remember how, but since I didnt change much in past 5 months, system is basically as it was.

It builds CPU version (I presume, or its just some wrapper?) via x64 Command Prompt for VS 2022, but cant convince it to actually build me GPU version. Dev powershell just does its crashy thing and cant pick correct x64 cl.exe no matter what, so I dont even attempt that (no clue why, everything in system points to that effin x64, but it just.. nopes.

Ive build llama.cpp recently, which I suppose was correct build, given it gave me quantize.exe I needed (yea I needed to build that too, cause no AVX). But llama-cpp-python with CUDA for GPU seems impossible right now.

Apr 16 '25 00:04 Mescalamba

I tried to build llama-cpp-python in Google Colab using different commands. But what worked for me was using uv. This code ran for 15 seconds.

!pip install uv
!uv init
!CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=75" FORCE_CMAKE=1 \
uv pip install --upgrade --force-reinstall llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122

May 01 '25 15:05 pavel-koleda

llama-cpp-python llama-cpp-python copied to clipboard

CUDA llama-cpp-python build failed.

llama-cpp-python
llama-cpp-python copied to clipboard