Current Behavior

I run the following: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

an error occured: ERROR: Failed building wheel for llama-cpp-python

Environment and Context

Physical hardware: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: Intel Xeon Processor (Skylake, IBRS) CPU family: 6 Model: 85 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 Stepping: 4 BogoMIPS: 4389.68 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopol ogy cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3d nowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt cl wb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 512 KiB (16 instances) L1i: 512 KiB (16 instances) L2: 64 MiB (16 instances) L3: 256 MiB (16 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Meltdown: Mitigation; PTI Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Retbleed: Mitigation; IBRS Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
Operating System: ubuntu1~22.04
SDK version:

$ python3 --3.11
$ make --4.3
$ g++ --11.4.0

Failure Information (for bugs)

... FAILED: vendor/llama.cpp/examples/llava/llama-llava-cli : && /usr/bin/g++ -pthread -B /mnt/x_env/compiler_compat -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llama-llava-cli -Wl,-rpath,/tmp/tmp6bws6ysg/build/vendor/llama.cpp/src:/tmp/tmp6bws6ysg/build/vendor/llama.cpp/ggml/src: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/src/libllama.so vendor/llama.cpp/ggml/src/libggml.so && : /mnt/x_env/compiler_compat/ld: warning: libcuda.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libgomp.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemCreate' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_barrier@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressReserve' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemUnmap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_parallel@GOMP_4.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemSetAccess' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGet' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_thread_num@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressFree' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuGetErrorString' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_single_start@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGetAttribute' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemMap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemRelease' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_num_threads@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemGetAllocationGranularity' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

*** CMake build failed error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

Steps to Reproduce

conda activate <my_env>
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

Nvidia Driver Version: 550.54.14 Cuda Tookit Verion: V12.4.99

Jul 23 '24 18:07 inst32i

same issue here ...

Jul 23 '24 22:07 gillbates

same issure here too

Jul 24 '24 00:07 bteinstein

same issure here too

Jul 24 '24 10:07 XingchenMengxiang

Same here

Jul 24 '24 11:07 TobiasKlapper

Same here as well.

Jul 24 '24 16:07 SweetestRug

Same here too

Jul 25 '24 00:07 bodybreaker

Current Behavior

I run the following: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

an error occured: ERROR: Failed building wheel for llama-cpp-python

Environment and Context

Physical hardware: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: Intel Xeon Processor (Skylake, IBRS) CPU family: 6 Model: 85 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 Stepping: 4 BogoMIPS: 4389.68 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopol ogy cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3d nowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt cl wb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 512 KiB (16 instances) L1i: 512 KiB (16 instances) L2: 64 MiB (16 instances) L3: 256 MiB (16 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Meltdown: Mitigation; PTI Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Retbleed: Mitigation; IBRS Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown

Operating System: ubuntu1~22.04

SDK version:
$ python3 --3.11
$ make --4.3
$ g++ --11.4.0
Failure Information (for bugs)

... FAILED: vendor/llama.cpp/examples/llava/llama-llava-cli : && /usr/bin/g++ -pthread -B /mnt/x_env/compiler_compat -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llama-llava-cli -Wl,-rpath,/tmp/tmp6bws6ysg/build/vendor/llama.cpp/src:/tmp/tmp6bws6ysg/build/vendor/llama.cpp/ggml/src: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/src/libllama.so vendor/llama.cpp/ggml/src/libggml.so && : /mnt/x_env/compiler_compat/ld: warning: libcuda.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libgomp.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemCreate' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_barrier@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressReserve' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemUnmap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_parallel@GOMP_4.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemSetAccess' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGet' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_thread_num@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressFree' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuGetErrorString' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_single_start@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGetAttribute' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemMap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemRelease' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_num_threads@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemGetAllocationGranularity' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

*** CMake build failed error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

Steps to Reproduce

conda activate <my_env>

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

Nvidia Driver Version: 550.54.14 Cuda Tookit Verion: V12.4.99

I solved this problem. This happend when Cuda version is different with Cuda toolkit version.

You need to check cuda-version with nvidia-smi

and check cuda-toolkit version wih conda list | grep cuda-toolkit

My version were 12.2 , 11.8

Jul 29 '24 01:07 bodybreaker

Same here. Installation worked fine with CMAKE_ARGS="-DLLAMA_CUBLAS=on" for llama-cpp-python <= 2.79.0. I now get the same error as OP for llama-cpp-python >= 2.80.0, whether I use CMAKE_ARGS="-DLLAMA_CUBLAS=on" or CMAKE_ARGS="-DGGML_CUDA=on"

Jul 31 '24 17:07 Viagounet

same issure here too in WSL2

Aug 10 '24 11:08 hhhhpaaa

same issue here too, WSL2 on Windows 10.

Aug 12 '24 02:08 gilbertc

same issue here

Aug 20 '24 02:08 tigert1998

I found a workaround to fix this issue:

clone this project and check out the version you would like to install
build this project with CMake
then here comes the key part: overwrite pyproject.toml with the following content

# [build-system]
# requires = ["scikit-build-core[pyproject]>=0.9.2"]
# build-backend = "scikit_build_core.build"

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "llama_cpp_python"
dynamic = ["version"]
description = "Python bindings for the llama.cpp library"
readme = "README.md"
license = { text = "MIT" }
authors = [
    { name = "Andrei Betlen", email = "[email protected]" },
]
dependencies = [
    "typing-extensions>=4.5.0",
    "numpy>=1.20.0",
    "diskcache>=5.6.1",
    "jinja2>=2.11.3",
]
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]


[project.optional-dependencies]
server = [
    "uvicorn>=0.22.0",
    "fastapi>=0.100.0",
    "pydantic-settings>=2.0.1",
    "sse-starlette>=1.6.1",
    "starlette-context>=0.3.6,<0.4",
    "PyYAML>=5.1",
]
test = [
    "pytest>=7.4.0",
    "httpx>=0.24.1",
    "scipy>=1.10",
]
dev = [
    "black>=23.3.0",
    "twine>=4.0.2",
    "mkdocs>=1.4.3",
    "mkdocstrings[python]>=0.22.0",
    "mkdocs-material>=9.1.18",
    "pytest>=7.4.0",
    "httpx>=0.24.1",
]
all = [
    "llama_cpp_python[server,test,dev]",
]

# [tool.scikit-build]
# wheel.packages = ["llama_cpp"]
# cmake.verbose = true
# cmake.minimum-version = "3.21"
# minimum-version = "0.5.1"
# sdist.include = [".git", "vendor/llama.cpp/*"]

[tool.setuptools.packages.find]
include = ["llama_cpp"]

[tool.setuptools.package-data]
"llama_cpp" = ["lib/*"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "llama_cpp/__init__.py"

[project.urls]
Homepage = "https://github.com/abetlen/llama-cpp-python"
Issues = "https://github.com/abetlen/llama-cpp-python/issues"
Documentation = "https://llama-cpp-python.readthedocs.io/en/latest/"
Changelog = "https://llama-cpp-python.readthedocs.io/en/latest/changelog/"

[tool.pytest.ini_options]
testpaths = "tests"

run pip install . --verbose

Aug 20 '24 09:08 tigert1998

Adding the path to libcuda.so to the LD_LIBRARY_PATH environment variable allows the examples to link so that the build can succeed.

Aug 29 '24 20:08 blkqi

Hello @blkqi

How did it work for you? Can you please share what all env or path settings to try?

Sep 15 '24 13:09 PurnaChandraPanda

Thank you @blkqi. Your advice really helped me. In my case, I used Dockerfile like that

ENV LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat/libcuda.so
RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.2.90

Sep 16 '24 02:09 JHH11

Installed correctly with the following command (ubuntu 24.04):

CMAKE_ARGS="-DGGML_CUDA=on" LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu"  pip install llama-cpp-python

In general put in LD_LIBRARY_PATH the path of the libcuda.so.1 that can be derived using the following command:

 dpkg -S libcuda.so.1

Nov 25 '24 21:11 fillumina

For me with conda it worked to build with LD_LIBRARY_PATH="<path_to>/miniconda3/envs/comfy/lib" pip install llama-cpp-python

Jan 29 '25 11:01 kampelmuehler

For me with conda it worked to build with LD_LIBRARY_PATH="<path_to>/miniconda3/envs/comfy/lib" pip install llama-cpp-python

This worked for me. Thanks!

Feb 21 '25 06:02 amangoel

For me with conda it worked to build with LD_LIBRARY_PATH="<path_to>/miniconda3/envs/comfy/lib" pip install llama-cpp-python

This solved it for me, thanks!

Mar 17 '25 16:03 koen-aerts

ERROR: Could not build wheels for llama-cpp-python

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce