llama-cpp-python Please release a cuda build for v0.3.5

Hi there. I see there is a metal build for v0.3.5. Would you please releasze a cuda version?

Best regards

Feb 07 '25 22:02 ParisNeo

Agree. I've manually built a CUDA version, but an official prebuilt release should be convenient for most users.

Feb 09 '25 06:02 la1ty

Agree. I've manually built a CUDA version, but an official prebuilt release should be convenient for most users.

how? is there reference for manual build ?

Feb 09 '25 10:02 Amrabdelhamed611

The latest version is v0.3.7. You can follow the steps in the CI workflow.

For Windows users, here is my two cents:

(Optional) Uninstall all MinGW tools (clang, gcc, etc.). Install everything.
Install Visual Studio 2022 with MSVC 2022, Cmake and Windows SDK. If you need to build it with CUDA<12.4, you should also install MSVC 2019. (You may need to add the directory of cmake.exe to PATH manually. Make sure when you call cmake in powershell it uses the VS version cmake.exe.)
Install CUDA.
Copy the four files from CUDA MSBuildExtensions directory to VS BuildCustomizations directory. (everything may be useful.)
Git clone the repository with submodule llama.cpp.
Activate the Python environment and run the following commands in PowerShell:

$env:CMAKE_ARGS = "-DGGML_CUDA=ON"
python -m pip install build wheel
python -m build --wheel

If you need to build it with CUDA<12.4, use MSVC 2019:

$env:CMAKE_ARGS = "-DGGML_CUDA=ON -DCMAKE_GENERATOR_TOOLSET=v142,host=x64,version=14.29"
python -m pip install build wheel
python -m build --wheel

Feb 09 '25 10:02 la1ty

@abetlen would you please add the workflow suggested by @la1ty to automate the generation of the builds as you release now versions?

Feb 09 '25 12:02 ParisNeo

+1 for pre-built whl's

Feb 10 '25 14:02 ZiyaCu

@ZiyaCu @ParisNeo @la1ty , Check out this repo: textgen-webui release includes llama-cpp-python CUDA wheels.

The only downside is that these wheels can't be imported using import llama_cpp. Instead, you should use import llama_cpp_cuda or import llama_cpp_cuda_tensorcore, depending on the wheel you installed.

You can find the wheels in the requirements file: 🔗 Requirements.txt

Or check the full release here: 🔗 llama-cpp-python-cuBLAS-wheels Release

Feb 12 '25 13:02 Amrabdelhamed611

@Amrabdelhamed611 thanks alot. I'll take a look. I am using this in lollms which sould work on all kinds of systems, and it is a real pain having to write custom code for every configuration.

Feb 12 '25 13:02 ParisNeo

PS E:\llama-cpp-python> conda activate CUDA125-py312 (CUDA125-py312) PS E:\llama-cpp-python> $env:CMAKE_ARGS = "-DGGML_CUDA=ON" (CUDA125-py312) PS E:\llama-cpp-python> python -m pip install build wheel Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/, http://mirrors.aliyun.com/pypi/simple/ Collecting build Downloading http://mirrors.aliyun.com/pypi/packages/84/c2/80633736cd183ee4a62107413def345f7e6e3c01563dbca1417363cf957e/build-1.2.2.post1-py3-none-any.whl (22 kB) Requirement already satisfied: wheel in d:\software\minipy312\envs\cuda125-py312\lib\site-packages (0.45.1) Requirement already satisfied: packaging>=19.1 in d:\software\minipy312\envs\cuda125-py312\lib\site-packages (from build) (24.2) Collecting pyproject_hooks (from build) Downloading http://mirrors.aliyun.com/pypi/packages/bd/24/12818598c362d7f300f18e74db45963dbcb85150324092410c8b49405e42/pyproject_hooks-1.2.0-py3-none-any.whl (10 kB) Collecting colorama (from build) Downloading http://mirrors.aliyun.com/pypi/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl (25 kB) Installing collected packages: pyproject_hooks, colorama, build Successfully installed build-1.2.2.post1 colorama-0.4.6 pyproject_hooks-1.2.0 (CUDA125-py312) PS E:\llama-cpp-python> python -m build --wheel

Creating isolated environment: venv+pip...
Installing packages in isolated environment:
- scikit-build-core[pyproject]>=0.9.2
Getting build dependencies for wheel...
Building wheel... *** scikit-build-core 0.10.7 using CMake 3.31.4 (wheel) *** Configuring CMake... 2025-02-22 02:34:20,699 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None loading initial cache file C:\Users\ADMINI~1\AppData\Local\Temp\tmpui8yd0_s\build\CMakeInit.txt -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.26100. -- The C compiler identification is MSVC 19.43.34808.0 -- The CXX compiler identification is MSVC 19.43.34808.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.43.34808/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.43.34808/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.47.1.windows.2") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - not found -- Found Threads: TRUE -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: AMD64 -- CMAKE_GENERATOR_PLATFORM: x64 -- Including CPU backend -- Found OpenMP_C: -openmp (found version "2.0") -- Found OpenMP_CXX: -openmp (found version "2.0") -- Found OpenMP: TRUE (found version "2.0") -- x86 detected -- Performing Test HAS_AVX_1 -- Performing Test HAS_AVX_1 - Success -- Performing Test HAS_AVX2_1 -- Performing Test HAS_AVX2_1 - Success -- Performing Test HAS_FMA_1 -- Performing Test HAS_FMA_1 - Success -- Performing Test HAS_AVX512_1 -- Performing Test HAS_AVX512_1 - Failed -- Performing Test HAS_AVX512_2 -- Performing Test HAS_AVX512_2 - Failed -- Adding CPU backend variant ggml-cpu: /arch:AVX2 GGML_AVX2;GGML_FMA;GGML_F16C -- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5/include (found version "12.5.82") -- CUDA Toolkit found -- Using CUDA architectures: native CMake Error at D:/software/Minipy312/envs/CUDA125-py312/Lib/site-packages/cmake/data/share/cmake-3.31/Modules/CMakeDetermineCompilerId.cmake:614 (message): No CUDA toolset found. Call Stack (most recent call first): D:/software/Minipy312/envs/CUDA125-py312/Lib/site-packages/cmake/data/share/cmake-3.31/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD) D:/software/Minipy312/envs/CUDA125-py312/Lib/site-packages/cmake/data/share/cmake-3.31/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test) D:/software/Minipy312/envs/CUDA125-py312/Lib/site-packages/cmake/data/share/cmake-3.31/Modules/CMakeDetermineCUDACompiler.cmake:131 (CMAKE_DETERMINE_COMPILER_ID) vendor/llama.cpp/ggml/src/ggml-cuda/CMakeLists.txt:25 (enable_language)

-- Configuring incomplete, errors occurred!

*** CMake configuration failed

ERROR Backend subprocess exited when trying to invoke build_wheel

Feb 21 '25 18:02 dw5189

@dw5189 There are two possible causes I guess:

Make sure you are using the VS version cmake.exe to compile this project. I run cmake --version in Powershell and it returns cmake version 3.29.5-msvc4. (I tried MinGW version and it failed. But currently the log seems normal, so good luck.)
Copy the four files from CUDA MSBuildExtensions directory to VS BuildCustomizations directory. If you don't know how to do it, just search No CUDA toolset found in any web search engine and it should return plenty of pages with details.

Feb 22 '25 08:02 la1ty