CUDA error 35
When i run ctransformers[cuda], i get the error: CUDA error 35 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:4236: CUDA driver version is insufficient for CUDA runtime version
However, the path "/home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu" does not exit. And my CUDA info:
And my package info:
how to fix it?
Here is my code:
llm = AutoModelForCausalLM.from_pretrained( "starcoder.ggmlv3.q8_0.bin", model_type="gpt_bigcode", top_p=0.95, temperature=0.2, max_new_tokens=512, threads=8, gpu_layers=50 )
Please update your NVIDIA Drivers and try again.
Hi @marella - I'm facing a similar issue in the servers that I am testing on. Upgrading the drivers might not be an option for me as it is a shared system several people use. Is it possible to manually build this library to run on Cuda 11.8 by making few tweaks to setup/cmake files?
Just an update, managed to get it running on CUDA 11.8😄 ! I knew it should work as I was able to run GGUF model using llama-cpp with the same CUDA versions and drivers. Here is the fix if anyone want to try it:
- Clone the library
git clone https://github.com/marella/ctransformers.git - Edit this line to use older cuda version: https://github.com/marella/ctransformers/blob/main/models/ggml/ggml-cuda.cu#L136 to:
#if CUDART_VERSION >= 11000
- In the root folder, execute:
CT_CUBLAS=1 pip install .
- Remember to install cuda libraries if you don't have them yet:
pip install nvidia-cuda-runtime-cu11 nvidia-cublas-cu11
@marella - Do you think I can start a PR to include the step 2 fix so this library is compatible with older versions too?
this should be integrated cuda 11.8 is working fine (and 11.8 should be compatible with 11.x) , even latest pytorch version (as of today 2.1) still supports it. And for updating Nvidia-drivers, it will not be easy on a cloud provider node (or like a HF space) , also from my experience updating nvidia-drivers on older cards (2070 Turing for example), just makes them slower so I stick with the best performing version.
@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers
@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers
Thank you @sujeendran . I actually builded with the fix.
I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99
Just a side note for GGUF: generation performance nearly same as llama-cpp-python
@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers
Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99 Just a side note for GGUF: generation performance nearly same as llama-cpp-python
can you run GGUF format with GPU ?
@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers
Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99 Just a side note for GGUF: generation performance nearly same as llama-cpp-python
can you run GGUF format with GPU ?
Yes, check the app.py here, GGUF is for both CPU and GPU and with changing layer on GPU you can run some of the ops on GPU if your GPU does not have enougn VRAM
https://huggingface.co/spaces/coqui/voice-chat-with-mistral