hwe icon indicating copy to clipboard operation
hwe copied to clipboard

cuda toolkit not installed for user

Open 81reap opened this issue 1 year ago • 2 comments

Steps To Recreate

  1. Perform a clean install of bazzite-nvidia.
  2. Login as the user.
  3. Check for cuda by running nvcc --version. It will fail to find the command.

Expected Behavior

rpm-ostree and nvidia-smi show that cuda and cuda toolkit should be installed, however nvcc --version fails to work.

reap@fedora:~$ nvidia-smi
Thu Feb 22 18:58:12 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 4000 SFF Ada ...    Off | 00000000:01:00.0 Off |                  Off |
| 30%   33C    P8               5W /  70W |      2MiB / 20475MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

reap@fedora:~$ rpm -qa | grep nvidia
nvidia-gpu-firmware-20240115-2.fc39.noarch
ublue-os-nvidia-addons-0.10-1.fc39.noarch
xorg-x11-drv-nvidia-cuda-libs-545.29.06-2.fc39.x86_64
nvidia-modprobe-545.29.06-1.fc39.x86_64
nvidia-persistenced-545.29.06-1.fc39.x86_64
nvidia-container-toolkit-base-1.14.5-1.x86_64
libnvidia-container1-1.14.5-1.x86_64
libnvidia-container-tools-1.14.5-1.x86_64
nvidia-container-toolkit-1.14.5-1.x86_64
xorg-x11-drv-nvidia-kmodsrc-545.29.06-2.fc39.x86_64
libva-nvidia-driver-0.0.11-1.fc39.x86_64
xorg-x11-drv-nvidia-libs-545.29.06-2.fc39.i686
xorg-x11-drv-nvidia-libs-545.29.06-2.fc39.x86_64
nvidia-settings-545.29.06-1.fc39.x86_64
xorg-x11-drv-nvidia-power-545.29.06-2.fc39.x86_64
kmod-nvidia-6.7.5-201.fsync.fc39.x86_64-545.29.06-3.fc39.x86_64
xorg-x11-drv-nvidia-545.29.06-2.fc39.x86_64
xorg-x11-drv-nvidia-cuda-libs-545.29.06-2.fc39.i686
xorg-x11-drv-nvidia-cuda-545.29.06-2.fc39.x86_64
xorg-x11-drv-nvidia-devel-545.29.06-2.fc39.x86_64

reap@fedora:~$ nvcc --version
# only works after the workaround

Hardware

B550I Aurus Pro AX AMD Ryzen 7 5700G Nvidia RTX 4000 SFF Ada Gen 2x32GB @ 3200 MHz 2TB NVME Drive

Setup Notes

  • Secureboot is disabled in the BIOS.
  • OS and KDE run on the AMD GPU. Steam Games are able to successfully launch on the Nvidia gpu.
  • After applying the workaround PyTorch is also able to successfully run on the Nvidia gpu.

The Workaround

note :: The workaround does not fix the issue for podman containers running with CDI. Any cuda required workloads will have to be run in the userspace.

note :: you may need to change the cuda version in these commands. See here

$ nvidia-smi
# this shows the correct output and says that cuda 12.3 is installed
$ nvcc --version
# this should fail to find nvcc
$ ls /etc/local
# this output does not contain cuda which confirms that the cuda toolkit is not installed

$ wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run
$ sudo sh cuda_12.3.2_545.23.08_linux.run
# this will require you to accept the licence first. You should only be installing the cuda drivers as the system already has nvidia drivers.
$ ls /etc/local
# now we have the cuda toolkit, but nvcc will still fail as it is not on your path

# add this to your ~/.bashrc so that it is loaded every boot
$ export PATH=/usr/local/cuda-12.3/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ nvcc --version 
# nvcc now works

Related Issues

  • https://github.com/ublue-os/bazzite/issues/749
  • https://github.com/ublue-os/bazzite/issues/796

81reap avatar Feb 23 '24 03:02 81reap