nvidia-docker stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.

I have config the docker 19.03.6 and nvidia-docker successfully.BUT ,when I test:

docker run --gpus all nvidia/cuda:10.0-base nvidia-smi GET errors :

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.

then, I check the nvidia-container-cli ,it seems no error sudo nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0226 06:26:25.224982 78809 nvc.c:281] initializing library context (version=1.0.2, build=ff40da533db929bf515aca59ba4c701a65a35e6b) I0226 06:26:25.225050 78809 nvc.c:255] using root / I0226 06:26:25.225061 78809 nvc.c:256] using ldcache /etc/ld.so.cache I0226 06:26:25.225071 78809 nvc.c:257] using unprivileged user 65534:65534 I0226 06:26:25.230611 78810 nvc.c:191] loading kernel module nvidia I0226 06:26:25.230931 78810 nvc.c:203] loading kernel module nvidia_uvm I0226 06:26:25.231053 78810 nvc.c:211] loading kernel module nvidia_modeset I0226 06:26:25.231436 78811 driver.c:133] starting driver service I0226 06:26:25.356687 78809 nvc_info.c:434] requesting driver information with '' I0226 06:26:25.356983 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.418.87.00 I0226 06:26:25.357280 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.87.00 I0226 06:26:25.357333 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.87.00 I0226 06:26:25.357441 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.87.00 I0226 06:26:25.357512 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.87.00 I0226 06:26:25.357559 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.87.00 I0226 06:26:25.357629 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.418.87.00 I0226 06:26:25.357711 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.87.00 I0226 06:26:25.357760 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.418.87.00 I0226 06:26:25.357806 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.418.87.00 I0226 06:26:25.357868 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.00 I0226 06:26:25.357928 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.87.00 I0226 06:26:25.358002 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.418.87.00 I0226 06:26:25.358053 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.87.00 I0226 06:26:25.358108 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.87.00 I0226 06:26:25.358179 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.87.00 I0226 06:26:25.358606 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.418.87.00 I0226 06:26:25.358847 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.87.00 I0226 06:26:25.358902 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.87.00 I0226 06:26:25.358951 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.87.00 I0226 06:26:25.359001 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.87.00 W0226 06:26:25.359039 78809 nvc_info.c:303] missing compat32 library libnvidia-ml.so W0226 06:26:25.359047 78809 nvc_info.c:303] missing compat32 library libnvidia-cfg.so W0226 06:26:25.359056 78809 nvc_info.c:303] missing compat32 library libcuda.so W0226 06:26:25.359066 78809 nvc_info.c:303] missing compat32 library libnvidia-opencl.so W0226 06:26:25.359076 78809 nvc_info.c:303] missing compat32 library libnvidia-ptxjitcompiler.so W0226 06:26:25.359086 78809 nvc_info.c:303] missing compat32 library libnvidia-fatbinaryloader.so W0226 06:26:25.359097 78809 nvc_info.c:303] missing compat32 library libnvidia-compiler.so W0226 06:26:25.359107 78809 nvc_info.c:303] missing compat32 library libvdpau_nvidia.so W0226 06:26:25.359117 78809 nvc_info.c:303] missing compat32 library libnvidia-encode.so W0226 06:26:25.359128 78809 nvc_info.c:303] missing compat32 library libnvidia-opticalflow.so W0226 06:26:25.359138 78809 nvc_info.c:303] missing compat32 library libnvcuvid.so W0226 06:26:25.359149 78809 nvc_info.c:303] missing compat32 library libnvidia-eglcore.so W0226 06:26:25.359159 78809 nvc_info.c:303] missing compat32 library libnvidia-glcore.so W0226 06:26:25.359169 78809 nvc_info.c:303] missing compat32 library libnvidia-tls.so W0226 06:26:25.359177 78809 nvc_info.c:303] missing compat32 library libnvidia-glsi.so W0226 06:26:25.359186 78809 nvc_info.c:303] missing compat32 library libnvidia-fbc.so W0226 06:26:25.359194 78809 nvc_info.c:303] missing compat32 library libnvidia-ifr.so W0226 06:26:25.359203 78809 nvc_info.c:303] missing compat32 library libGLX_nvidia.so W0226 06:26:25.359212 78809 nvc_info.c:303] missing compat32 library libEGL_nvidia.so W0226 06:26:25.359220 78809 nvc_info.c:303] missing compat32 library libGLESv2_nvidia.so W0226 06:26:25.359253 78809 nvc_info.c:303] missing compat32 library libGLESv1_CM_nvidia.so I0226 06:26:25.359527 78809 nvc_info.c:229] selecting /usr/bin/nvidia-smi I0226 06:26:25.359560 78809 nvc_info.c:229] selecting /usr/bin/nvidia-debugdump I0226 06:26:25.359585 78809 nvc_info.c:229] selecting /usr/bin/nvidia-persistenced I0226 06:26:25.359608 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-control I0226 06:26:25.359632 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-server I0226 06:26:25.359667 78809 nvc_info.c:366] listing device /dev/nvidiactl I0226 06:26:25.359676 78809 nvc_info.c:366] listing device /dev/nvidia-uvm I0226 06:26:25.359687 78809 nvc_info.c:366] listing device /dev/nvidia-uvm-tools I0226 06:26:25.359697 78809 nvc_info.c:366] listing device /dev/nvidia-modeset W0226 06:26:25.359731 78809 nvc_info.c:274] missing ipc /var/run/nvidia-persistenced/socket W0226 06:26:25.359753 78809 nvc_info.c:274] missing ipc /tmp/nvidia-mps I0226 06:26:25.359763 78809 nvc_info.c:490] requesting device information with '' I0226 06:26:25.366457 78809 nvc_info.c:520] listing device /dev/nvidia0 (GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 at 00000000:05:00.0) I0226 06:26:25.373129 78809 nvc_info.c:520] listing device /dev/nvidia1 (GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 at 00000000:06:00.0) I0226 06:26:25.380167 78809 nvc_info.c:520] listing device /dev/nvidia2 (GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 at 00000000:85:00.0) I0226 06:26:25.387215 78809 nvc_info.c:520] listing device /dev/nvidia3 (GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb at 00000000:86:00.0) NVRM version: 418.87.00 CUDA version: 10.1

Device Index: 0 Device Minor: 0 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 Bus Location: 00000000:05:00.0 Architecture: 3.7

Device Index: 1 Device Minor: 1 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 Bus Location: 00000000:06:00.0 Architecture: 3.7

Device Index: 2 Device Minor: 2 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 Bus Location: 00000000:85:00.0 Architecture: 3.7

Device Index: 3 Device Minor: 3 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb Bus Location: 00000000:86:00.0 Architecture: 3.7 I0226 06:26:25.387330 78809 nvc.c:318] shutting down library context I0226 06:26:25.388428 78811 driver.c:192] terminating driver service I0226 06:26:25.440777 78809 driver.c:233] driver service terminated successfully

is the nvidia-driver-version too low? in fact,the 418.87.00 is the nvidia official network recommend, and how to update the driver by apt instead of mannually with the driver-run file? I do not konw how to make it works. can anyone help me?

Feb 26 '20 06:02 chunniunai220ml

and I reinstall the nvidia-driver by NVIDIA-Linux-x86_64-440.33.01.run, meet the same error.

Feb 26 '20 09:02 chunniunai220ml

same problem here on


Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic

My docker is Docker version 19.03.6, build 369ce74a3c and I installed nvidia driver from here. When I run sudo nvidia-container-cli -k -d /dev/tty info The output is

I0228 09:13:49.695833 1120 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0228 09:13:49.695933 1120 nvc.c:255] using root /
I0228 09:13:49.695948 1120 nvc.c:256] using ldcache /etc/ld.so.cache
I0228 09:13:49.695958 1120 nvc.c:257] using unprivileged user 65534:65534
I0228 09:13:49.696847 1121 nvc.c:191] loading kernel module nvidia
E0228 09:13:50.186352 1121 nvc.c:193] could not load kernel module nvidia
I0228 09:13:50.186425 1121 nvc.c:203] loading kernel module nvidia_uvm
E0228 09:13:50.628481 1121 nvc.c:205] could not load kernel module nvidia_uvm
I0228 09:13:50.628508 1121 nvc.c:211] loading kernel module nvidia_modeset
E0228 09:13:51.064044 1121 nvc.c:213] could not load kernel module nvidia_modeset
I0228 09:13:51.064251 1129 driver.c:133] starting driver service
I0228 09:13:51.066557 1120 driver.c:233] driver service terminated with signal 15
nvidia-container-cli: initialization error: cuda error: unknown error

the output of my attempt to run docker run --gpus all nvidia/cuda:10.0-base nvidia-smi is as follows

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/defdc438de52aef6ec0266539ea834320a9580f75bac6b71cfd2d2e3c999aae9/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: unknown.
ERRO[0000] error waiting for container: context canceled

any idea?

Feb 28 '20 09:02 soheilade

@soheilade have you solved the problem?

Mar 05 '20 06:03 chunniunai220ml

yeah, try reinstalling nvidia driver from here and run this docker command to launch carla server in a docker container docker run -p 2000-2002:2000-2002 --rm -d -it -e NVIDIA_VISIBLE_DEVICES=0 --runtime nvidia carlasim/carla:0.9.5 ./CarlaUE4.sh /Game/Maps/Town01

Mar 05 '20 08:03 soheilade

This point to an error with the driver. Can you install the CUDA samples on the host machine and try to run for example deviceQuery?

Mar 12 '20 06:03 RenaudWasTaken

@RenaudWasTaken I think I have insalled the driver suceessful, I can use tensorfow1.14.0 in the host machine. and I run commands as follows: 1.cat /proc/driver/nvidia/version, shows:

NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.33.01 Wed Nov 13 00:00:22 UTC 2019 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

2.sudo dpkg --list | grep nvidia-*,shows:

iU libnvidia-container-tools 1.0.7-1 amd64 NVIDIA container runtime library (command-line tools) iU libnvidia-container1:amd64 1.0.7-1 amd64 NVIDIA container runtime library

run the deviceQuery ,shows：

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 5 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 6 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 133 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 3: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 134 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU2) : No Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU3) : No Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU2) : No Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU3) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU0) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU1) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU3) : Yes Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU0) : No Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU1) : No Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 9.0, NumDevs = 4 Result = PASS

what's wrong with these information? I do not find something.

Mar 12 '20 07:03 chunniunai220ml

Ok, nothing wrong with CUDA, the other two that might help are:

vectoraddDrv
nvidia-bug-report.sh

Mar 12 '20 07:03 RenaudWasTaken

in fact, I do not know how to use vectoraddDrv. cd /usr/local/cuda/samples/0_Simple/vectorAddDrv, then sudo make, generate vectorAddDrv*
sudo nvidia-bug-report.sh, generate nvidia-bug-report.log.gz, some errors as follows: ff:15.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers [8086:2fb6] (rev 02)

I think this is not critical error, and what information should I look at ?

Mar 12 '20 08:03 chunniunai220ml

@RenaudWasTaken The problem has not been solved for me, can you give me futher help?

Apr 07 '20 14:04 chunniunai220ml

same error occur to me~~~heh

Apr 26 '20 04:04 ReyRen

Same problem I am facing as well

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

tx2-01:~$ uname -a

Linux jetson-tx2-01 4.9.140-tegra #1 SMP PREEMPT Mon Aug 12 21:29:52 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux

tx2-01:~$ sudo nvidia-container-cli -k -d /dev/tty info [sudo] password for civilmaps:

-- WARNING, the following logs are for debugging purposes only --

I0609 06:28:32.004669 8657 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db)
I0609 06:28:32.004901 8657 nvc.c:255] using root /
I0609 06:28:32.004930 8657 nvc.c:256] using ldcache /etc/ld.so.cache
I0609 06:28:32.004947 8657 nvc.c:257] using unprivileged user 65534:65534
W0609 06:28:32.005415 8657 nvc.c:171] failed to detect NVIDIA devices
I0609 06:28:32.005723 8658 nvc.c:191] loading kernel module nvidia
E0609 06:28:32.006013 8658 nvc.c:193] could not load kernel module nvidia
I0609 06:28:32.006037 8658 nvc.c:203] loading kernel module nvidia_uvm
E0609 06:28:32.006142 8658 nvc.c:205] could not load kernel module nvidia_uvm
I0609 06:28:32.006161 8658 nvc.c:211] loading kernel module nvidia_modeset
E0609 06:28:32.006259 8658 nvc.c:213] could not load kernel module nvidia_modeset
I0609 06:28:32.007119 8659 driver.c:101] starting driver service
E0609 06:28:32.009737 8659 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0609 06:28:32.010706 8657 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

tx2-01:~$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

Jun 09 '20 06:06 harendracmaps

Same problem I am facing as well

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

tx2-01:~$ uname -a

Linux jetson-tx2-01 4.9.140-tegra #1 SMP PREEMPT Mon Aug 12 21:29:52 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux

tx2-01:~$ sudo nvidia-container-cli -k -d /dev/tty info [sudo] password for civilmaps:

-- WARNING, the following logs are for debugging purposes only --

I0609 06:28:32.004669 8657 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db)
I0609 06:28:32.004901 8657 nvc.c:255] using root /
I0609 06:28:32.004930 8657 nvc.c:256] using ldcache /etc/ld.so.cache
I0609 06:28:32.004947 8657 nvc.c:257] using unprivileged user 65534:65534
W0609 06:28:32.005415 8657 nvc.c:171] failed to detect NVIDIA devices
I0609 06:28:32.005723 8658 nvc.c:191] loading kernel module nvidia
E0609 06:28:32.006013 8658 nvc.c:193] could not load kernel module nvidia
I0609 06:28:32.006037 8658 nvc.c:203] loading kernel module nvidia_uvm
E0609 06:28:32.006142 8658 nvc.c:205] could not load kernel module nvidia_uvm
I0609 06:28:32.006161 8658 nvc.c:211] loading kernel module nvidia_modeset
E0609 06:28:32.006259 8658 nvc.c:213] could not load kernel module nvidia_modeset
I0609 06:28:32.007119 8659 driver.c:101] starting driver service
E0609 06:28:32.009737 8659 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0609 06:28:32.010706 8657 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

tx2-01:~$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

@harendracmaps Have you solved your issue? I'm having the same exact error except I'm running it on an NVIDIA Xavier AGX

Running on the following specs:

Ubuntu 18.04.3 LTS (Bionic Beaver)
Docker version 19.03.12
NVIDIA Docker version 2.0.3
Jetpack version 4.3 (ARM64)
CUDA version 10.0.326

nvidia@x02:~$ uname -a
Linux x02 4.9.140-tegra #1 SMP PREEMPT Mon Dec 9 22:52:02 PST 2019 aarch64 aarch64 aarch64 GNU/Linux

$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019

nvidia@x02:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Mon_Mar_11_22:13:24_CDT_2019
Cuda compilation tools, release 10.0, V10.0.326

nvidia@x02:~$ dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                             Version               Architecture          Description
+++-================================-=====================-=====================-======================================================================
un  libgldispatch0-nvidia            <none>                <none>                (no description available)
ii  libnvidia-container-tools        1.2.0-1               arm64                 NVIDIA container runtime library (command-line tools)
ii  libnvidia-container0:arm64       0.9.0~beta.1          arm64                 NVIDIA container runtime library
ii  libnvidia-container1:arm64       1.2.0-1               arm64                 NVIDIA container runtime library
un  nvidia-304                       <none>                <none>                (no description available)
un  nvidia-340                       <none>                <none>                (no description available)
un  nvidia-384                       <none>                <none>                (no description available)
un  nvidia-common                    <none>                <none>                (no description available)
ii  nvidia-container-csv-cuda        10.0.326-1            arm64                 Jetpack CUDA CSV file
ii  nvidia-container-csv-cudnn       7.6.3.28-1+cuda10.0   arm64                 Jetpack CUDNN CSV file
ii  nvidia-container-csv-tensorrt    6.0.1.10-1+cuda10.0   arm64                 Jetpack TensorRT CSV file
ii  nvidia-container-csv-visionworks 1.6.0.500n            arm64                 Jetpack VisionWorks CSV file
ii  nvidia-container-runtime         3.1.0-1               arm64                 NVIDIA container runtime
un  nvidia-container-runtime-hook    <none>                <none>                (no description available)
ii  nvidia-container-toolkit         1.2.1-1               arm64                 NVIDIA container runtime hook
un  nvidia-cuda-dev                  <none>                <none>                (no description available)
un  nvidia-docker                    <none>                <none>                (no description available)
ii  nvidia-docker2                   2.2.0-1               all                   nvidia-docker CLI wrapper
ii  nvidia-l4t-3d-core               32.3.1-20191209230245 arm64                 NVIDIA GL EGL Package
ii  nvidia-l4t-apt-source            32.3.1-20191209230245 arm64                 NVIDIA L4T apt source list debian package
ii  nvidia-l4t-bootloader            32.3.1-20191209230245 arm64                 NVIDIA Bootloader Package
ii  nvidia-l4t-camera                32.3.1-20191209230245 arm64                 NVIDIA Camera Package
ii  nvidia-l4t-ccp-t186ref           32.3.1-20191209230245 arm64                 NVIDIA Compatibility Checking Package
ii  nvidia-l4t-configs               32.3.1-20191209230245 arm64                 NVIDIA configs debian package
ii  nvidia-l4t-core                  32.3.1-20191209230245 arm64                 NVIDIA Core Package
ii  nvidia-l4t-cuda                  32.3.1-20191209230245 arm64                 NVIDIA CUDA Package
ii  nvidia-l4t-firmware              32.3.1-20191209230245 arm64                 NVIDIA Firmware Package
ii  nvidia-l4t-graphics-demos        32.3.1-20191209230245 arm64                 NVIDIA graphics demo applications
ii  nvidia-l4t-gstreamer             32.3.1-20191209230245 arm64                 NVIDIA GST Application files
ii  nvidia-l4t-init                  32.3.1-20191209230245 arm64                 NVIDIA Init debian package
ii  nvidia-l4t-initrd                32.3.1-20191209230245 arm64                 NVIDIA initrd debian package
ii  nvidia-l4t-jetson-io             32.3.1-20191209230245 arm64                 NVIDIA Jetson.IO debian package
ii  nvidia-l4t-jetson-multimedia-api 32.3.1-20191209230245 arm64                 NVIDIA Jetson Multimedia API is a collection of lower-level APIs that
ii  nvidia-l4t-kernel                4.9.140-tegra-32.3.1- arm64                 NVIDIA Kernel Package
ii  nvidia-l4t-kernel-dtbs           4.9.140-tegra-32.3.1- arm64                 NVIDIA Kernel DTB Package
ii  nvidia-l4t-kernel-headers        4.9.140-tegra-32.3.1- arm64                 NVIDIA Linux Tegra Kernel Headers Package
ii  nvidia-l4t-multimedia            32.3.1-20191209230245 arm64                 NVIDIA Multimedia Package
ii  nvidia-l4t-multimedia-utils      32.3.1-20191209230245 arm64                 NVIDIA Multimedia Package
ii  nvidia-l4t-oem-config            32.3.1-20191209230245 arm64                 NVIDIA OEM-Config Package
ii  nvidia-l4t-tools                 32.3.1-20191209230245 arm64                 NVIDIA Public Test Tools Package
ii  nvidia-l4t-wayland               32.3.1-20191209230245 arm64                 NVIDIA Wayland Package
ii  nvidia-l4t-weston                32.3.1-20191209230245 arm64                 NVIDIA Weston Package
ii  nvidia-l4t-x11                   32.3.1-20191209230245 arm64                 NVIDIA X11 Package
ii  nvidia-l4t-xusb-firmware         32.3.1-20191209230245 arm64                 NVIDIA USB Firmware Package
un  nvidia-libopencl1-dev            <none>                <none>                (no description available)
un  nvidia-prime                     <none>                <none>                (no description available)

Aug 07 '20 18:08 paldana-ISI

if you get this error on a Jetson board:

could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory

Then it means you've installed nvidia-container-toolkit from the official repos (https://nvidia.github.io/nvidia-docker). nvidia-container-toolkit does not support Jetson right now, but there is a beta version in the jetpack repos that does. Remove the nvidia-docker repo, then reinstall nvidia-container-runtime and nvidia-jetpack.

Nov 12 '20 14:11 mildsunrise

if you get this error on a Jetson board:
could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Then it means you've installed nvidia-container-toolkit from the official repos (https://nvidia.github.io/nvidia-docker). nvidia-container-toolkit does not support Jetson right now, but there is a beta version in the jetpack repos that does. Remove the nvidia-docker repo, then reinstall nvidia-container-runtime and nvidia-jetpack.

Thank you!!

Dec 18 '20 05:12 tekh

upgrade nvida-docker version to nvidia-docker2-2.5.0，the problem is solved perfectly.

CUDA Version: 11.0 docker-ce: 19.03.7 nvidia-docker2-2.5.0-1

Dec 18 '20 10:12 TechPorter

@mildsunrise It seems nvidia-docker supports jetson but I am still getting this error even with nvidia-docker2-2.5.0-1

Mar 25 '21 17:03 ghost

what makes you think nvidia-docker supports jetson? the FAQ still says you need the SDK manager (aka the jetson repos). you need the version of nvidia-docker2 that comes with the jetson repos, not the nvidia-docker one

Mar 26 '21 00:03 mildsunrise

@mildsunrise ah you mean "jetpack" by "jetson repos" don't you? So that might very well be my issue. I am using the stock kernel ConnectTech provides and presumed because it had L4T 32.4.4 installed that it had Jetpack 4.2.2 installed, but I think I need to reflash it because the manufacturer probably does just a minimal install for QA purposes.

https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson

Mar 26 '21 12:03 ghost

I am getting the same error with nvidia-container-toolkit/bionic,now 1.5.1-1 amd64 under Ubuntu Server 20.04 LTS, running headless. I installed the nvidia drivers via the .run file, downloaded from the official nvidia page and nvidia-smi is working, as is the hashcat benchmark.

However, when I run docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi with docker-ce/focal,now 5:20.10.7~3-0~ubuntu-focal amd64, I get the aforementioned error docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

I already tried newer drivers from the repositories up until version 470, but nothing worked.

Any ideas?

Jul 01 '21 07:07 AlexAshs

It seems that NVidia continues ignoring Linux support

Jul 23 '21 11:07 TheMarshalMole

@AlexAshs sorry for the delay in getting back to you. Would you mind creating a new ticket and including the debug output from /var/log/nvidia-container-toolkit.log? This logging can be enabled by uncommenting the #debug= line in the nvidia-container-cli section of the /etc/nvidia-container-runtime/config.toml file.

The reason I ask for a new issue is that this one has gotten quite long and seems to contain a mix of issues related to Jetson platforms and others that have been marked as fixed.

Jul 23 '21 12:07 elezar

@elezar No worrys, I was just trying things out, since this is my first dedicated GPU, so nothing in production just yet :D I have posted my issue here Containers with gpus not starting up. I really don't post issues often, I prefer finding solutoins first, so if there is something missing or the title sucks, just let me know, so I can provide what is necessary to tackle this.

Jul 23 '21 16:07 AlexAshs

I have the exact same problem.

Configuration: Host: Windows 10 with WSL2, with CUDA installed.

Error: docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

Command: docker run --gpus all --cpus 2 --name test -it pytorch/pytorch

Hardware: NVidia GeForce GTX 1660 TI

Any solution to this program?

Jul 23 '21 17:07 TheMarshalMole

Are you installing updates from the windows insider dev channel? It seems to be a requirement for this setup to work.

Jul 23 '21 18:07 AlexAshs

@AlexAshs I am not signed for insiders program. May you tell me which update is necessary? I will install it manually

Jul 23 '21 19:07 TheMarshalMole

@TheMarshalMole I found this guide, that should make things easier for you: https://www.forecr.io/blogs/installation/nvidia-docker-installation-for-ubuntu-in-wsl-2

Jul 23 '21 19:07 AlexAshs

I installed my GPU driver from Software and update >> additional drivers and It solved my problem

Oct 10 '21 11:10 anajar2198