ctransformers CUDA error 35

When i run ctransformers[cuda], i get the error: CUDA error 35 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:4236: CUDA driver version is insufficient for CUDA runtime version

However, the path "/home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu" does not exit. And my CUDA info:

gpu_info

And my package info: package

how to fix it?

Aug 15 '23 07:08 curname

Here is my code: llm = AutoModelForCausalLM.from_pretrained( "starcoder.ggmlv3.q8_0.bin", model_type="gpt_bigcode", top_p=0.95, temperature=0.2, max_new_tokens=512, threads=8, gpu_layers=50 )

Aug 15 '23 08:08 curname

Please update your NVIDIA Drivers and try again.

Aug 15 '23 11:08 marella

Hi @marella - I'm facing a similar issue in the servers that I am testing on. Upgrading the drivers might not be an option for me as it is a shared system several people use. Is it possible to manually build this library to run on Cuda 11.8 by making few tweaks to setup/cmake files?

Sep 25 '23 16:09 sujeendran

Just an update, managed to get it running on CUDA 11.8😄 ! I knew it should work as I was able to run GGUF model using llama-cpp with the same CUDA versions and drivers. Here is the fix if anyone want to try it:

Clone the library git clone https://github.com/marella/ctransformers.git
Edit this line to use older cuda version: https://github.com/marella/ctransformers/blob/main/models/ggml/ggml-cuda.cu#L136 to:

#if CUDART_VERSION >= 11000

In the root folder, execute:

CT_CUBLAS=1 pip install .

Remember to install cuda libraries if you don't have them yet:

pip install nvidia-cuda-runtime-cu11 nvidia-cublas-cu11

@marella - Do you think I can start a PR to include the step 2 fix so this library is compatible with older versions too?

Sep 25 '23 16:09 sujeendran

this should be integrated cuda 11.8 is working fine (and 11.8 should be compatible with 11.x) , even latest pytorch version (as of today 2.1) still supports it. And for updating Nvidia-drivers, it will not be easy on a cloud provider node (or like a HF space) , also from my experience updating nvidia-drivers on older cards (2070 Turing for example), just makes them slower so I stick with the best performing version.

Oct 27 '23 07:10 gorkemgoknar

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Oct 27 '23 07:10 sujeendran

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99
Just a side note for GGUF: generation performance nearly same as llama-cpp-python

Oct 27 '23 08:10 gorkemgoknar

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99 Just a side note for GGUF: generation performance nearly same as llama-cpp-python

can you run GGUF format with GPU ?

Nov 04 '23 15:11 AlexBlack2202

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99 Just a side note for GGUF: generation performance nearly same as llama-cpp-python

can you run GGUF format with GPU ?

Yes, check the app.py here, GGUF is for both CPU and GPU and with changing layer on GPU you can run some of the ops on GPU if your GPU does not have enougn VRAM

https://huggingface.co/spaces/coqui/voice-chat-with-mistral

Nov 04 '23 15:11 gorkemgoknar