docs icon indicating copy to clipboard operation
docs copied to clipboard

CUDA and CUDNN article should mention Docker installs

Open leviport opened this issue 3 years ago • 2 comments

https://support.system76.com/articles/cuda/

This is quickly becoming "the old way" of doing things. The article should probably talk about how to set up and use CUDA and cuDNN in Docker containers.

Some reference material here: https://github.com/NVIDIA/nvidia-docker/wiki/CUDA

leviport avatar Mar 28 '22 21:03 leviport

@leviport I took a stab at this using my desktop that has a GTX 1070 and Pop!_OS.

  • I used these instructions to install Docker in Pop!_OS: https://linuxhint.com/install-docker-on-pop_os/

  • I found a guide to install the NVIDIA container toolkit in Docker here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

I ran into these problems:

  1. Setting up the package repository and adding the GPG key will return an "Unsupported Distribution!" error. Pop!_OS isn't listed here: https://nvidia.github.io/libnvidia-container
 distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
# Unsupported distribution!
# Check https://nvidia.github.io/libnvidia-container

  1. I ignored the above error and attempting the rest of the installation. See below error when running sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia smi
Unable to find image 'nvidia/cuda:11.0-base' locally
11.0-base: Pulling from nvidia/cuda
54ee1f796a1e: Pull complete 
f7bfea53ad12: Pull complete 
46d371e02073: Pull complete 
b66c17bbf772: Pull complete 
3642f1a6dfb3: Pull complete 
e5ce55b8b4b9: Pull complete 
155bc0332b0a: Pull complete 
Digest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67a
Status: Downloaded newer image for nvidia/cuda:11.0-base
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

Here's my nvidia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
| 32%   48C    P0    30W / 151W |   1036MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3077      G   /usr/lib/xorg/Xorg                428MiB |
|    0   N/A  N/A      3288      G   /usr/bin/gnome-shell               31MiB |
|    0   N/A  N/A     26843      G   /usr/lib/firefox/firefox          442MiB |
|    0   N/A  N/A     27082      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A     30358      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A     33556      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A     37010      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A     58902      G   ...AAAAAAAA== --shared-files       18MiB |
|    0   N/A  N/A     59022      G   ...veSuggestionsOnlyOnDemand       37MiB |
|    0   N/A  N/A     59151      G   ...RendererForSitePerProcess       18MiB |
|    0   N/A  N/A     64067      G   ...AAAAAAAA== --shared-files       13MiB |
|    0   N/A  N/A     67964      G   ...AAAAAAAAA= --shared-files       27MiB |
+-----------------------------------------------------------------------------+

Ampersandstorm avatar Mar 29 '22 21:03 Ampersandstorm

The required docker dependencies are already packaged in Pop!_OS

  • https://github.com/pop-os/nvidia-docker
  • https://github.com/pop-os/nvidia-container-toolkit
  • https://github.com/pop-os/libnvidia-container

So adding NVIDIA PPAs is not recommended.

mmstick avatar Mar 29 '22 22:03 mmstick

This has been updated so closing this, if it is still not correct please reopen.

ahoneybun avatar May 24 '23 17:05 ahoneybun