llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

GPU Support Missing in Version >=0.3.5 on Windows with CUDA 12.4 and RTX 3090

Open mcglynnfinn opened this issue 9 months ago • 2 comments

Issue Description:

I'm experiencing a discrepancy between version 0.3.4 and later versions (>=0.3.5) regarding GPU utilization:

Version 0.3.4 (Prebuilt Wheel): The prebuilt wheel for 0.3.4 loads the model onto the GPU; however, it's not compatible with phi4.

Version >=0.3.5: There are no prebuilt wheels available for these versions, and when building from source, only the CPU is being used—the model does not load onto the GPU.

System Details:

Operating System: Windows 11 CUDA Version: 12.4 GPU: RTX 3090 24GB Steps Taken:

Installed version 0.3.4 via the prebuilt wheel – confirmed GPU loading (but phi4 incompatibility remains). Upgraded to version 0.3.5 (and above) by building from source with CUDA support enabled. Verified that the build settings include -DGGML_CUDA=on and confirmed that the system has CUDA 12.4 installed. Despite these configurations, the build defaults to CPU usage, and the model never loads onto the GPU. Could you please advise on whether this is an expected behavior for versions >=0.3.5, or if there might be an issue with GPU detection/configuration on Windows 11 with CUDA 12.4? Any guidance or troubleshooting steps to enable GPU support for these versions would be greatly appreciated.

mcglynnfinn avatar Mar 09 '25 09:03 mcglynnfinn

Maybe you can try my new prebuilt: https://github.com/JamePeng/llama-cpp-python/releases

JamePeng avatar Mar 09 '25 11:03 JamePeng

hey @mcglynnfinn Try installing the library using the below command CMAKE_ARGS="-DGGML_CUDA=ON -DLLAMA_LLAVA=OFF" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

AleefBilal avatar Apr 18 '25 05:04 AleefBilal