CUDA device cannot be loaded from pytorch
NVIDIA Open GPU Kernel Modules Version
575.51.03
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [x] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Fedora 41
Kernel Release
6.14.6
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 2070
Describe the bug
pytorch cannot find the gpu in a container environment, nvidia-smi works and is correctly showing the card.
To Reproduce
- Install nvidia container toolkit, from the nvidia cuda repo (https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/) I installed the following packages (not sure if all necessary for container):
kmod-nvidia-latest-dkms.x86_64 3:570.148.08-1.fc41 cuda-fedora41-x86_64 libnvidia-cfg.x86_64 3:570.148.08-1.fc41 cuda-fedora41-x86_64 libnvidia-gpucomp.x86_64 3:575.51.03-1.fc41 cuda-fedora41-x86_64 libnvidia-ml.x86_64 3:570.148.08-1.fc41 cuda-fedora41-x86_64 nvidia-driver-cuda.x86_64 3:570.148.08-1.fc41 cuda-fedora41-x86_64 nvidia-driver-cuda-libs.x86_64 3:570.148.08-1.fc41 cuda-fedora41-x86_64 nvidia-kmod-common.noarch 3:570.148.08-1.fc41 cuda-fedora41-x86_64 nvidia-modprobe.x86_64 3:575.51.03-1.fc41 cuda-fedora41-x86_64 nvidia-persistenced.x86_64 3:570.148.08-1.fc41 cuda-fedora41-x86_64
- Create a container (
podman run --replace -it --device nvidia.com/gpu=all nvidia/cuda:12.9.0-cudnn-runtime-ubuntu24.04 /bin/bash) - Use python to install pytorch in the official nvidia/cuda container (
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128). In the python shell: - import torch
- torch.cuda.is_available()
- Error log shows cuda initialization fails and device not found.
Bug Incidence
Always
nvidia-bug-report.log.gz
After I switch to the POE driver this problem disappeared, so I didn't have the chance to run it.
More Info
No response
Related to #797