nvidia-docker
nvidia-docker copied to clipboard
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown
Recently installed docker and nvidia cuda tools onto a PopOS 22.04 (Ubuntu 22.04) system. I am attempting to enable GPU access in docker.
❯ docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 bash -c "ldconfig; nvidia-smi"
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
I have already attempted to perform a clean install of docker (following the instructions at https://docs.docker.com/engine/install/ubuntu/) and the install of nvidia-docker2 (following the instructions at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide)
Am I missing a step? How can I resolve this?
Here are my particulars:
> lsb_release -a
No LSB modules are available.
Distributor ID: Pop
Description: Pop!_OS 22.04 LTS
Release: 22.04
Codename: jammy
> nvidia-smi
Thu Aug 4 12:16:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 On | N/A |
| 0% 35C P8 11W / 310W | 449MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 4239 G /usr/lib/xorg/Xorg 203MiB |
| 0 N/A N/A 5148 G /usr/bin/gnome-shell 64MiB |
| 0 N/A N/A 6664 G alacritty 10MiB |
| 0 N/A N/A 8064 G firefox 168MiB |
+-----------------------------------------------------------------------------+
> apt list | rg installed | rg docker
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
docker-ce-cli/jammy,now 5:20.10.17~3-0~ubuntu-jammy amd64 [installed]
docker-ce-rootless-extras/jammy,now 5:20.10.17~3-0~ubuntu-jammy amd64 [installed,automatic]
docker-ce/jammy,now 5:20.10.17~3-0~ubuntu-jammy amd64 [installed]
docker-compose-plugin/jammy,now 2.6.0~ubuntu-jammy amd64 [installed]
docker-scan-plugin/jammy,now 0.17.0~ubuntu-jammy amd64 [installed,automatic]
nvidia-docker2/jammy,jammy,now 2.9.0-1~1644261147~22.04~c7639fe all [installed]
> apt list | rg installed | rg nvidia
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
libnvidia-cfg1-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-common-515/jammy,jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed all [installed,automatic]
libnvidia-compute-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-compute-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-container-tools/jammy,now 1.8.0-1~1644255740~22.04~76ed4b4 amd64 [installed,automatic]
libnvidia-container1/jammy,now 1.8.0-1~1644255740~22.04~76ed4b4 amd64 [installed,automatic]
libnvidia-decode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-decode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-egl-wayland1/jammy,now 1:1.1.9-1.1 amd64 [installed,automatic]
libnvidia-encode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-encode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-extra-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-fbc1-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-fbc1-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-gl-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-gl-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
nvidia-compute-utils-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-container-toolkit/jammy,now 1.8.0-1pop1~1644260705~22.04~60691e5 amd64 [installed,automatic]
nvidia-dkms-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-docker2/jammy,jammy,now 2.9.0-1~1644261147~22.04~c7639fe all [installed]
nvidia-driver-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-kernel-common-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-kernel-source-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-settings/jammy,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
system76-driver-nvidia/jammy,jammy,now 20.04.60~1659452571~22.04~9ef923b all [installed]
xserver-xorg-video-nvidia-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
https://github.com/NVIDIA/nvidia-docker/issues/1643#issuecomment-1152957965
I appear to be having the same issue even with a later version of the toolkit:
lib-version: 1.11.0
build date: 2022-09-18T23:16+00:00
build revision:
build compiler: x86_64-linux-gnu-gcc-11 11.2.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -fPIC -g -O2 -ffile-prefix-map=/build/libnvidia-container-CeXONE/libnvidia-container-1.11.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -I/usr/include/tirpc -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro```
I'm having the same issue!
I'm having the same issue, any updates?
From what I've gathered responding to other tickets with this same issue, PopOS appears to compile and distribute their own version of libnvidia-container
with WITH_NVCGO=no
at compile time . Without this set to yes
(which it is by default), there is no support for cgroupv2
, and can result in the error you see here.
Since PopOS is building this library themselves, even recent versions of the library will appear to exhibit this issue, even if the same version of the official library does not.
Please make sure to override the PopOS repos and pull from the official NVIDIA repos instead.
A community provided solution for doing so can be found here: https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e