llama-cpp-python
llama-cpp-python copied to clipboard
CUDA llama-cpp-python build failed.
I am trying to build llama-cpp-python with CUDA and it is failing. I have tried to use some of the suggestions in here for similar issues and they aren't working and I don't see what is wrong in the output. My system is 21.3 Linux Mint. My CUDA gfx driver version is 12.5. My CUDA toolkit version is 11.5.
This is the command I used for the installation.
python3 -m venv venv
source venv/bin/activate
CMAKE_ARGS="-DGGML_CUDA=on" LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu" pip install llama-cpp-python==0.3.4 --verbose
The output is here
https://gist.github.com/Ado012/70492020c3567bc60f6daa2ba89e2be5
are you sure you got the nvcc configurations right?
with you can check which nvcc or nvcc --version. if there is no output :
you can add the following to your .bashrc file. (replace cuda-12.6 with your version, check with nvidia-smi):
export PATH="/usr/local/cuda-12.6/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH"
then try reinstalling with the same installation command
are you sure you got the nvcc configurations right? with you can check
which nvccornvcc --version. if there is no output :you can add the following to your .bashrc file. (replace cuda-12.6 with your version, check with
nvidia-smi):export PATH="/usr/local/cuda-12.6/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH"then try reinstalling with the same installation command
So I did that and the build completes but for some reason it seems to be fetching and building a cpu version and not a cuda version no matter what I try. The gpu apparently remains unutilized when I run this version.
Here is the command I used.
set CMAKE_ARGS="-DGGML_CUDA=on" && set FORCE_CMAKE=1 && pip install llama-cpp-python==0.2.90 --verbose --force-reinstall --no-cache
Here is the output of the build
https://gist.github.com/Ado012/8bb05396886b1d69eead03ffe9dd90e3
Here is the output of nvcc --version
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Here is the entry in .bashrc (along with an earlier addtion I had made
#added for bitsandbytes and aivoicecloning 052923
LD_LIBRARY_PATH='/usr/lib/x86_64-linux-gnu/'
export LD_LIBRARY_PATH
#added for llama-cpp-python 032825 does this overwrite above?
export PATH="/usr/local/cuda-12.5/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH"
Here is run output https://gist.github.com/Ado012/3c08b61df8dd35a6813329a4b4d3cf33
Additionally I tried building llama-cpp-python in Google Colab (CUDA 12.5) to see whether the issue was a lower CUDA version like some suggested. This didn't work either although maybe for different reasons.
Colab build command
!set CMAKE_ARGS="-DGGML_CUDA=on" && set FORCE_CMAKE=1 && venv/bin/pip install llama-cpp-python[server]==0.2.90 --verbose --force-reinstall --no-cache-dir
Colab build output https://gist.github.com/Ado012/3f98a1a2bdfda9aa25d993345b760ec9
Colab run output https://gist.github.com/Ado012/8fd3ceef8d2f9d47b874e97a9a401dc0
Again the GPU is not being used.
PS: I can't use a prebuilt CUDA wheel because my CUDA version isn't the right one for one of them plus I want to maintain compatibility with as wide a range of systems as possible.
Trying to do about same with 12.6 and just simple no AVX anything (cause my CPU just cant). Ive managed to get only 22mb something that for sure is just CPU version.
In the past I built 0.3.2 with 12.6, pretty much by accident I think.. unfortunately dont remember how, but since I didnt change much in past 5 months, system is basically as it was.
It builds CPU version (I presume, or its just some wrapper?) via x64 Command Prompt for VS 2022, but cant convince it to actually build me GPU version. Dev powershell just does its crashy thing and cant pick correct x64 cl.exe no matter what, so I dont even attempt that (no clue why, everything in system points to that effin x64, but it just.. nopes.
Ive build llama.cpp recently, which I suppose was correct build, given it gave me quantize.exe I needed (yea I needed to build that too, cause no AVX). But llama-cpp-python with CUDA for GPU seems impossible right now.
I tried to build llama-cpp-python in Google Colab using different commands. But what worked for me was using uv. This code ran for 15 seconds.
!pip install uv
!uv init
!CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=75" FORCE_CMAKE=1 \
uv pip install --upgrade --force-reinstall llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122