llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

CUDA Error : CUDA driver version is insufficient for CUDA runtime version

Open VijayAsokkumar opened this issue 1 year ago • 0 comments

Discussed in https://github.com/abetlen/llama-cpp-python/discussions/1425

Originally posted by VijayAsokkumar May 3, 2024 Hi All, I am using llamacpppython in my app, which I have installed in a conda environment. I have built a chat application using the LLaMA 2 7b model with Python Flask. I was able to use it on my laptop with an M1 chip. However, when I try to deploy the app on an AWS g4dn.xlarge instance with a Tesla T4 GPU, I am facing the following error whenever the app tries to use llamacpppython:

CUDA error 35 at /home/conda/feedstock_root/build_artifacts/llama.cpp_1703017359354/work/ggml-cuda.cu:493: CUDA driver version is insufficient for CUDA runtime version GGML_ASSERT: /home/conda/feedstock_root/build_artifacts/llama.cpp_1703017359354/work/ggml-cuda.cu:493: !"CUDA error"

$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Wed_Nov_22_10:17:15_PST_2023 Cuda compilation tools, release 12.3, V12.3.107 Build cuda_12.3.r12.3/compiler.33567101_0

Essentially, I need suggestions on the following areas:

The supported CUDA driver version for the Tesla T4 GPU, as the AWS instance runs on Ubuntu 18.04. I noticed that the default CUDA driver is version 9, and I have installed version 12.3. I need guidance on how to configure the latest CUDA driver within the conda environment. Instructions on how to enable the app to utilize the GPU.

Thanks, Vijay Asokkumar

VijayAsokkumar avatar May 07 '24 19:05 VijayAsokkumar