llama-cpp-python How to install the latest version with GPU support

Hey, I've been struggling for a month to install the latest version with CUDA. It was a nightmare.

So here is the guide how to do that.

tldr docker syntax:

RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y build-essential \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors \
    && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \
    && apt-get clean

RUN pip install uv
RUN uv init .
RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
RUN CMAKE_ARGS="-DGGML_CUDA=on \
            -DCMAKE_CUDA_ARCHITECTURES=75 \
            -DLLAMA_BUILD_EXAMPLES=OFF \
            -DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match

Explanation: Installation for CPU is easy as a cake. Installation from source with GPU support is slow and labour intensive, so the best way is to install using provided wheels. Github doesn't serve the last release in wheel. Latest was 0.3.8 while github has 0.3.4. It's relatively new but doesn't support gemma3.

First we need to provide paths to gcc and g++ compilers. Somehow this is a dealbreaker.

RUN export CC=/usr/bin/gcc CXX=/usr/bin/g++
RUN export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH

Linux dependencies:

    apt-get install -y build-essential \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors \
    && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd \

This one shortens the traceback in case anything fails. And it will fail likely.

 -DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 \

LLAMA_CUBLAS is obsolete, so we need to replace it with:

DGGML_CUDA=on

UV somehow manages to install this while pip can't.

uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
--index-url https://pypi.org/simple \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
--index-strategy unsafe-best-match

Here we need to provide both --index-url https://pypi.org/simple and --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 because otherwise either 0.3.8 either cuda support wouldn't be available to pip. Replace cu122 with your cuda version. --index-strategy unsafe-best-match is also required otherwise it didn't build.

May 02 '25 17:05 shigabeev

Karma bless you for this recipe @shigabeev !

I drop here some additional steps just in case anybody else needs them.

Download the CUDA Toolkit according to your version https://developer.nvidia.com/cuda-toolkit-archive Install it using the following option --silent --no-drm --no-man-page --toolkit Export the location of the CUDA compiler export CUDACXX=/usr/local//bin/nvcc

Use the steps provided above.

May 07 '25 11:05 LDelPinoNT

Hi @shigabeev,

I have tried a similar version to install llama-cpp-python with CUDA GPU enabled. Here is my file:

#!/bin/bash

rm -rf test-venv

uv venv test-venv
. test-venv/bin/activate

uv pip install --upgrade pip

export CC=/usr/bin/gcc CXX=/usr/bin/g++ CUDA_PATH=/usr/local/cuda CUDACXX=/usr/local/cuda/bin/nvcc
export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
CMAKE_ARGS="-DGGML_CUDA=on \
            -DLLAMA_BUILD_EXAMPLES=OFF \
            -DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
uv pip install --system --upgrade --force-reinstall llama-cpp-python==0.3.8 \
#--index-url https://pypi.org/simple \
#--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
#--index-strategy unsafe-best-match

#uv pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 --no-cache-dir --force-reinstall --upgrade
#uv pip install --no-cache-dir -r requirements.txt

Here is the output in the terminal:

Using Python 3.12.7 environment at sorbobot-venv
Resolved 1 package in 53ms
Prepared 1 package in 1ms
Installed 1 package in 21ms
 + pip==25.1.1
Using Python 3.10.12 environment at /usr
Resolved 6 packages in 304ms
Prepared 6 packages in 454ms
error: Failed to install: jinja2-3.1.6-py3-none-any.whl (jinja2==3.1.6)
  Caused by: failed to create directory `/usr/local/lib/python3.10/dist-packages/jinja2-3.1.6.dist-info`
  Caused by: Permission denied (os error 13)

I am working on ubuntu 22.04 with cuda.18. I am using this wheel version #--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 which refers to cuda 12.2 because it worked before with my computer on llama-cpp-python 0.3.4. Do you know where does the error come from ?

May 19 '25 06:05 michelonfrancoisSUMMIT

Hi @michelonfrancoisSUMMIT I think that this problem is caused by the --system attribute when calling pip install

error: Failed to install: jinja2-3.1.6-py3-none-any.whl (jinja2==3.1.6) Caused by: failed to create directory /usr/local/lib/python3.10/dist-packages/jinja2-3.1.6.dist-info Caused by: Permission denied (os error 13)

This worked for me:


export CC=/usr/bin/gcc CXX=/usr/bin/g++ CUDA_PATH=/usr/local/cuda CUDACXX=/usr/local/cuda/bin/nvcc
export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
CMAKE_ARGS="-DGGML_CUDA=on \
            -DLLAMA_BUILD_EXAMPLES=OFF \
            -DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
uv pip install  --force-reinstall llama-cpp-python==0.3.8

I still need to check the CUDA compatibility, but at least it is installed

May 19 '25 12:05 DanielRosiqueEgea

Hi @DanielRosiqueEgea ,

Thanks for the tip. With the following version it works for me and I can use LLM with my gpu too.

. sorbobot-venv/bin/activate

uv pip install --upgrade pip

export CC=/usr/bin/gcc CXX=/usr/bin/g++ CUDA_PATH=/usr/local/cuda CUDACXX=/usr/local/cuda/bin/nvcc
export LD_LIBRARY_PATH=/usr/lib/gcc/$(gcc -dumpmachine)/$(gcc -dumpversion):$LD_LIBRARY_PATH
CMAKE_ARGS="-DGGML_CUDA=on \
            -DLLAMA_BUILD_EXAMPLES=OFF \
            -DLLAMA_BUILD_TESTS=OFF" FORCE_CMAKE=1 CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
uv pip install llama-cpp-python==0.3.9

May 20 '25 09:05 michelonfrancoisSUMMIT

There is a fork containing the latest CUDA builds (cu124, cu126 / Windows, Linux). https://github.com/JamePeng/llama-cpp-python/releases

May 21 '25 12:05 John6666cat

llama-cpp-python llama-cpp-python copied to clipboard

How to install the latest version with GPU support

llama-cpp-python
llama-cpp-python copied to clipboard