llama-cpp-python
llama-cpp-python copied to clipboard
CUDA Error : CUDA driver version is insufficient for CUDA runtime version
Discussed in https://github.com/abetlen/llama-cpp-python/discussions/1425
Originally posted by VijayAsokkumar May 3, 2024 Hi All, I am using llamacpppython in my app, which I have installed in a conda environment. I have built a chat application using the LLaMA 2 7b model with Python Flask. I was able to use it on my laptop with an M1 chip. However, when I try to deploy the app on an AWS g4dn.xlarge instance with a Tesla T4 GPU, I am facing the following error whenever the app tries to use llamacpppython:
CUDA error 35 at /home/conda/feedstock_root/build_artifacts/llama.cpp_1703017359354/work/ggml-cuda.cu:493: CUDA driver version is insufficient for CUDA runtime version GGML_ASSERT: /home/conda/feedstock_root/build_artifacts/llama.cpp_1703017359354/work/ggml-cuda.cu:493: !"CUDA error"
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Wed_Nov_22_10:17:15_PST_2023 Cuda compilation tools, release 12.3, V12.3.107 Build cuda_12.3.r12.3/compiler.33567101_0
Essentially, I need suggestions on the following areas:
The supported CUDA driver version for the Tesla T4 GPU, as the AWS instance runs on Ubuntu 18.04. I noticed that the default CUDA driver is version 9, and I have installed version 12.3. I need guidance on how to configure the latest CUDA driver within the conda environment. Instructions on how to enable the app to utilize the GPU.
Thanks, Vijay Asokkumar