llama-cpp-python
llama-cpp-python copied to clipboard
How to install the latest version with GPU support
Hey, I've been struggling for a month to install the latest version with CUDA. It was a nightmare.
So here is the guide how to do that.
tldr docker syntax:
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y build-essential \
ocl-icd-opencl-dev opencl-headers clinfo \
libclblast-dev libopenblas-dev \
&& mkdir -p /etc/OpenCL/vendors \
&& echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
&& apt-get clean
RUN pip install uv
RUN uv init .
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
RUN CMAKE_ARGS="-DGGML_CUDA=on \
-DCMAKE_CUDA_ARCHITECTURES=75 \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match
Explanation: Installation for CPU is easy as a cake. Installation from source with GPU support is slow and labour intensive, so the best way is to install using provided wheels. Github doesn't serve the last release in wheel. Latest was 0.3.8 while github has 0.3.4. It's relatively new but doesn't support gemma3.
First we need to provide paths to gcc and g++ compilers. Somehow this is a dealbreaker.
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
Linux dependencies:
apt-get install -y build-essential \
ocl-icd-opencl-dev opencl-headers clinfo \
libclblast-dev libopenblas-dev \
&& mkdir -p /etc/OpenCL/vendors \
&& echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
This one shortens the traceback in case anything fails. And it will fail likely.
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
LLAMA_CUBLAS is obsolete, so we need to replace it with:
DGGML_CUDA=on
UV somehow manages to install this while pip can't.
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match
Here we need to provide both --index-url https://pypi.org/simple and --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 because otherwise either 0.3.8 either cuda support wouldn't be available to pip. Replace cu122 with your cuda version. --index-strategy unsafe-best-match is also required otherwise it didn't build.
Karma bless you for this recipe @shigabeev !
I drop here some additional steps just in case anybody else needs them.
Download the CUDA Toolkit according to your version https://developer.nvidia.com/cuda-toolkit-archive
Install it using the following option
Use the steps provided above.
Hi @shigabeev,
I have tried a similar version to install llama-cpp-python with CUDA GPU enabled. Here is my file:
#!/bin/bash
rm -rf test-venv
uv venv test-venv
. test-venv/bin/activate
uv pip install --upgrade pip
export CC=/usr/bin/gcc CXX=/usr/bin/g++ CUDA_PATH=/usr/local/cuda CUDACXX=/usr/local/cuda/bin/nvcc
export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
CMAKE_ARGS="-DGGML_CUDA=on \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
#--index-url https://pypi.org/simple \
#--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
#--index-strategy unsafe-best-match
#uv pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 --no-cache-dir --force-reinstall --upgrade
#uv pip install --no-cache-dir -r requirements.txt
Here is the output in the terminal:
Using Python 3.12.7 environment at sorbobot-venv
Resolved 1 package in 53ms
Prepared 1 package in 1ms
Installed 1 package in 21ms
+ pip==25.1.1
Using Python 3.10.12 environment at /usr
Resolved 6 packages in 304ms
Prepared 6 packages in 454ms
error: Failed to install: jinja2-3.1.6-py3-none-any.whl (jinja2==3.1.6)
Caused by: failed to create directory `/usr/local/lib/python3.10/dist-packages/jinja2-3.1.6.dist-info`
Caused by: Permission denied (os error 13)
I am working on ubuntu 22.04 with cuda.18. I am using this wheel version #--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 which refers to cuda 12.2 because it worked before with my computer on llama-cpp-python 0.3.4. Do you know where does the error come from ?
Hi @michelonfrancoisSUMMIT I think that this problem is caused by the --system attribute when calling pip install
error: Failed to install: jinja2-3.1.6-py3-none-any.whl (jinja2==3.1.6) Caused by: failed to create directory
/usr/local/lib/python3.10/dist-packages/jinja2-3.1.6.dist-infoCaused by: Permission denied (os error 13)
This worked for me:
export CC=/usr/bin/gcc CXX=/usr/bin/g++ CUDA_PATH=/usr/local/cuda CUDACXX=/usr/local/cuda/bin/nvcc
export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
CMAKE_ARGS="-DGGML_CUDA=on \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
uv pip install --force-reinstall llama-cpp-python==0.3.8
I still need to check the CUDA compatibility, but at least it is installed
Hi @DanielRosiqueEgea ,
Thanks for the tip. With the following version it works for me and I can use LLM with my gpu too.
. sorbobot-venv/bin/activate
uv pip install --upgrade pip
export CC=/usr/bin/gcc CXX=/usr/bin/g++ CUDA_PATH=/usr/local/cuda CUDACXX=/usr/local/cuda/bin/nvcc
export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
CMAKE_ARGS="-DGGML_CUDA=on \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
uv pip install llama-cpp-python==0.3.9
There is a fork containing the latest CUDA builds (cu124, cu126 / Windows, Linux). https://github.com/JamePeng/llama-cpp-python/releases