llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

no CUDA-capable device is detected

Open rucnyz opened this issue 10 months ago • 3 comments

I have already installed nvcc with sudo apt install nvidia-cuda-toolkit. When I run ./train_gpt2cu, it shows:

[GPT-2]
max_seq_len: 1024
vocab_size: 50257
num_layers: 12
num_heads: 12
channels: 768
num_parameters: 124439808
[CUDA ERROR] at file train_gpt2.cu:501:
no CUDA-capable device is detected

I am using WSL2 (ubuntu 22.04) and nvcc (11.5)

Are there any packages that I haven't downloaded yet?

rucnyz avatar Apr 12 '24 01:04 rucnyz

It seems like you have not installed a GPU driver for your GPU. If you try running nvidia-smi, it will not work for this reason. Refer to this information page:

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

AndreSlavescu avatar Apr 12 '24 01:04 AndreSlavescu

It seems like you have not installed a GPU driver for your GPU. If you try running nvidia-smi, it will not work for this reason. Refer to this information page:

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

@AndreSlavescu Thanks for your reply! But when I run nvidia-smi, it indeed shows:

Thu Apr 11 22:05:22 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   54C    P5              7W /  120W |    1974MiB /   8188MiB |     26%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

rucnyz avatar Apr 12 '24 02:04 rucnyz

no CUDA-capable device is detected

Okay in that case it seems you have installed it. There are quite a few problems if you install it a certain way, so refer to this thread:

https://forums.developer.nvidia.com/t/no-cuda-capable-device-is-detected/39478

AndreSlavescu avatar Apr 12 '24 03:04 AndreSlavescu

no CUDA-capable device is detected

Okay in that case it seems you have installed it. There are quite a few problems if you install it a certain way, so refer to this thread:

https://forums.developer.nvidia.com/t/no-cuda-capable-device-is-detected/39478

Thanks! The reason is that I downloaded the wrong version of the CUDA toolkit. I downloaded it again and then it worked!

rucnyz avatar Apr 14 '24 20:04 rucnyz