CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected"
Hello all,
Thanks for your great work here! When I run using cudarc, I get the error:
called `Result::unwrap()` on an `Err` value: Cuda(Cuda(DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")))
Here is my system information:
$ nvidia-smi
Tue Jun 11 23:53:28 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.72 Driver Version: 536.45 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro M2000M On | 00000000:01:00.0 Off | N/A |
| N/A 0C P8 N/A / 200W | 0MiB / 4096MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 33 G /Xwayland N/A |
+---------------------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
5.0
$ echo $CUDA_VISIBLE_DEVICES
0
I would appreciate any help!
Is pytorch able to see the GPU? Also what cuda toolkit version is being targeted by cudarc (if using cuda-version-from-build-system, is it being compiled on this machine?)
@EricLBuehler any more information on this issue? Will close in a week if not
@coreylowman sorry for not getting back! I am running this on my GPU and Pytorch can see it (torch.cuda.is_available() == True).
@EricLBuehler are there any differences with dynamic loading vs dynamic linking features for cudarc? Also curious about what toolkit version you are targeting in cudarc features
I am using cuda-version-from-build-system and dynamic-linking. How should I try dynamic loading?
If you don't enable the dynamic-linking feature it will use dynamic loading.
🤔 Could you try targeting 12.2 (cuda-12020) instead of version from build system? Just curious if that would change anything.
Hmm yeah, same error. Current:
cudarc = { version = "0.11.5", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-12020"], default-features=false }
I got nothing off the top of my head. Do you get this error if you git clone cudarc and try to run the unit tests?
cargo test --tests --no-default-features -F std,cuda-12050,driver
Is this running inside a docker container?
If that doesn't work I'd probably try to go to c++ level and verify a simple example there that links to cuda finds gpu. If that doesn't work then that at least tells us that pytorch is doing something special that we need to copy.
Hi both, I also have as similar error:
DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted
[jzhao399@atl1-1-02-018-25-0 release]$ which nvidia-smi
/usr/bin/nvidia-smi
[jzhao399@atl1-1-02-018-25-0 release]$ nvidia-smi
Wed Jul 17 11:25:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:C1:00.0 Off | 0 |
| N/A 34C P0 43W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
via PyTorch, this can be solved but not sure how to solve here.
Thanks,
Jianshu
FWIW PyTorch bundles the CUDA runtime with fat binaries they compile AFAIK? So that'd be more of a static build vs cudarc here relying on dynamic linking?
AFAIK (and I don't know much on the topic), with docker containers your project needs:
- Container: CUDA runtime libs (where
cudarcand PyTorch package would differ) - Host: The supporting driver which
nvidia-smiinteracts with.
The container is then run with some extra config to add support for the GPU which mounts some extra libs/devices (which makes nvidia-smi work within the container IIRC).
NVIDIA-SMI 535.72 Driver Version: 536.45 CUDA Version: 12.2 Quadro M2000M Cuda compilation tools, release 12.5, V12.5.40
Are you still able to reproduce this issue?
- It looks like you had kernel and CUDA runtime driver correctly aligned on the system (
nvidia-smioutput), but were building with CUDA 12.5 (nvcc --version)? - The
Quadro M2000MGPU is a Maxwell GM107 model, limited to CC 5.0 /sm_50. There shouldn't be any CUDA compat issues there with the different CUDA 12.x versions? 🤔 - Your follow-up comment confirmed the same failure when building cudarc with CUDA 12.2 target, but with dynamic loading instead of switching to dynamic linking.
Reproduction conditions weren't entirely clear. Potentially it was due to your compilation with NVCC (CUDA 12.5) and lack of compat package on the runtime host? (CUDA 12.2)
This explanation notes that you should be fine when nvcc builds for an older version of the CUDA runtime than you're system is running. But when it's the other way around, you can run into a problem and need to install the compat packages instead.
For additional clarity:
$ docker run --rm -it --gpus all fedora:41
# Just like when running on my container host (WSL2):
$ nvidia-smi --version
NVIDIA-SMI version : 550.54.14
NVML version : 550.54
DRIVER version : 551.78
CUDA Version : 12.4
$ nvcc --version
bash: nvcc: command not found
# Install nvidia's CUDA repo for Fedora 41:
$ dnf config-manager addrepo --from-repofile https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo
# Install NVCC with CUDA 12.9:
$ dnf install -yq cuda-nvcc-12-9
$ /usr/local/cuda-12.9/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0
Now watch what happens for nvidia-smi output when I use the compat package for CUDA 12.9:
$ dnf install -yq cuda-compat-12-9
# Use the compat libs instead (now the CUDA runtime version is bumped):
$ LD_LIBRARY_PATH=/usr/local/cuda-12.9/compat nvidia-smi --version
NVIDIA-SMI version : 550.54.14
NVML version : 550.54
DRIVER version : 551.78
CUDA Version : 12.9
# For reference here are the files the compat package is providing:
$ ls /usr/local/cuda-12.9/compat
libcuda.so libcuda.so.575.57.08 libcudadebugger.so.575.57.08 libnvidia-nvvm.so.575.57.08 libnvidia-pkcs11-openssl3.so.575.57.08 libnvidia-ptxjitcompiler.so.575.57.08
libcuda.so.1 libcudadebugger.so.1 libnvidia-nvvm.so.4 libnvidia-nvvm70.so.4 libnvidia-ptxjitcompiler.so.1
Additional references:
- https://en.wikipedia.org/wiki/CUDA#GPUs_supported
- https://docs.nvidia.com/deploy/cuda-compatibility/#use-the-right-cuda-forward-compatibility-package
Closing cause stale and likely toolkit installation issue. Please repoen with additional details