stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
I have config the docker 19.03.6 and nvidia-docker successfully.BUT ,when I test:
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi GET errors :
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.
then, I check the nvidia-container-cli ,it seems no error sudo nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0226 06:26:25.224982 78809 nvc.c:281] initializing library context (version=1.0.2, build=ff40da533db929bf515aca59ba4c701a65a35e6b) I0226 06:26:25.225050 78809 nvc.c:255] using root / I0226 06:26:25.225061 78809 nvc.c:256] using ldcache /etc/ld.so.cache I0226 06:26:25.225071 78809 nvc.c:257] using unprivileged user 65534:65534 I0226 06:26:25.230611 78810 nvc.c:191] loading kernel module nvidia I0226 06:26:25.230931 78810 nvc.c:203] loading kernel module nvidia_uvm I0226 06:26:25.231053 78810 nvc.c:211] loading kernel module nvidia_modeset I0226 06:26:25.231436 78811 driver.c:133] starting driver service I0226 06:26:25.356687 78809 nvc_info.c:434] requesting driver information with '' I0226 06:26:25.356983 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.418.87.00 I0226 06:26:25.357280 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.87.00 I0226 06:26:25.357333 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.87.00 I0226 06:26:25.357441 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.87.00 I0226 06:26:25.357512 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.87.00 I0226 06:26:25.357559 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.87.00 I0226 06:26:25.357629 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.418.87.00 I0226 06:26:25.357711 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.87.00 I0226 06:26:25.357760 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.418.87.00 I0226 06:26:25.357806 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.418.87.00 I0226 06:26:25.357868 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.00 I0226 06:26:25.357928 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.87.00 I0226 06:26:25.358002 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.418.87.00 I0226 06:26:25.358053 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.87.00 I0226 06:26:25.358108 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.87.00 I0226 06:26:25.358179 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.87.00 I0226 06:26:25.358606 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.418.87.00 I0226 06:26:25.358847 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.87.00 I0226 06:26:25.358902 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.87.00 I0226 06:26:25.358951 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.87.00 I0226 06:26:25.359001 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.87.00 W0226 06:26:25.359039 78809 nvc_info.c:303] missing compat32 library libnvidia-ml.so W0226 06:26:25.359047 78809 nvc_info.c:303] missing compat32 library libnvidia-cfg.so W0226 06:26:25.359056 78809 nvc_info.c:303] missing compat32 library libcuda.so W0226 06:26:25.359066 78809 nvc_info.c:303] missing compat32 library libnvidia-opencl.so W0226 06:26:25.359076 78809 nvc_info.c:303] missing compat32 library libnvidia-ptxjitcompiler.so W0226 06:26:25.359086 78809 nvc_info.c:303] missing compat32 library libnvidia-fatbinaryloader.so W0226 06:26:25.359097 78809 nvc_info.c:303] missing compat32 library libnvidia-compiler.so W0226 06:26:25.359107 78809 nvc_info.c:303] missing compat32 library libvdpau_nvidia.so W0226 06:26:25.359117 78809 nvc_info.c:303] missing compat32 library libnvidia-encode.so W0226 06:26:25.359128 78809 nvc_info.c:303] missing compat32 library libnvidia-opticalflow.so W0226 06:26:25.359138 78809 nvc_info.c:303] missing compat32 library libnvcuvid.so W0226 06:26:25.359149 78809 nvc_info.c:303] missing compat32 library libnvidia-eglcore.so W0226 06:26:25.359159 78809 nvc_info.c:303] missing compat32 library libnvidia-glcore.so W0226 06:26:25.359169 78809 nvc_info.c:303] missing compat32 library libnvidia-tls.so W0226 06:26:25.359177 78809 nvc_info.c:303] missing compat32 library libnvidia-glsi.so W0226 06:26:25.359186 78809 nvc_info.c:303] missing compat32 library libnvidia-fbc.so W0226 06:26:25.359194 78809 nvc_info.c:303] missing compat32 library libnvidia-ifr.so W0226 06:26:25.359203 78809 nvc_info.c:303] missing compat32 library libGLX_nvidia.so W0226 06:26:25.359212 78809 nvc_info.c:303] missing compat32 library libEGL_nvidia.so W0226 06:26:25.359220 78809 nvc_info.c:303] missing compat32 library libGLESv2_nvidia.so W0226 06:26:25.359253 78809 nvc_info.c:303] missing compat32 library libGLESv1_CM_nvidia.so I0226 06:26:25.359527 78809 nvc_info.c:229] selecting /usr/bin/nvidia-smi I0226 06:26:25.359560 78809 nvc_info.c:229] selecting /usr/bin/nvidia-debugdump I0226 06:26:25.359585 78809 nvc_info.c:229] selecting /usr/bin/nvidia-persistenced I0226 06:26:25.359608 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-control I0226 06:26:25.359632 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-server I0226 06:26:25.359667 78809 nvc_info.c:366] listing device /dev/nvidiactl I0226 06:26:25.359676 78809 nvc_info.c:366] listing device /dev/nvidia-uvm I0226 06:26:25.359687 78809 nvc_info.c:366] listing device /dev/nvidia-uvm-tools I0226 06:26:25.359697 78809 nvc_info.c:366] listing device /dev/nvidia-modeset W0226 06:26:25.359731 78809 nvc_info.c:274] missing ipc /var/run/nvidia-persistenced/socket W0226 06:26:25.359753 78809 nvc_info.c:274] missing ipc /tmp/nvidia-mps I0226 06:26:25.359763 78809 nvc_info.c:490] requesting device information with '' I0226 06:26:25.366457 78809 nvc_info.c:520] listing device /dev/nvidia0 (GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 at 00000000:05:00.0) I0226 06:26:25.373129 78809 nvc_info.c:520] listing device /dev/nvidia1 (GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 at 00000000:06:00.0) I0226 06:26:25.380167 78809 nvc_info.c:520] listing device /dev/nvidia2 (GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 at 00000000:85:00.0) I0226 06:26:25.387215 78809 nvc_info.c:520] listing device /dev/nvidia3 (GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb at 00000000:86:00.0) NVRM version: 418.87.00 CUDA version: 10.1
Device Index: 0 Device Minor: 0 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 Bus Location: 00000000:05:00.0 Architecture: 3.7
Device Index: 1 Device Minor: 1 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 Bus Location: 00000000:06:00.0 Architecture: 3.7
Device Index: 2 Device Minor: 2 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 Bus Location: 00000000:85:00.0 Architecture: 3.7
Device Index: 3 Device Minor: 3 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb Bus Location: 00000000:86:00.0 Architecture: 3.7 I0226 06:26:25.387330 78809 nvc.c:318] shutting down library context I0226 06:26:25.388428 78811 driver.c:192] terminating driver service I0226 06:26:25.440777 78809 driver.c:233] driver service terminated successfully
is the nvidia-driver-version too low? in fact,the 418.87.00 is the nvidia official network recommend, and how to update the driver by apt instead of mannually with the driver-run file? I do not konw how to make it works. can anyone help me?
and I reinstall the nvidia-driver by NVIDIA-Linux-x86_64-440.33.01.run, meet the same error.
same problem here on
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
My docker is Docker version 19.03.6, build 369ce74a3c
and I installed nvidia driver from here.
When I run
sudo nvidia-container-cli -k -d /dev/tty info
The output is
I0228 09:13:49.695833 1120 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0228 09:13:49.695933 1120 nvc.c:255] using root /
I0228 09:13:49.695948 1120 nvc.c:256] using ldcache /etc/ld.so.cache
I0228 09:13:49.695958 1120 nvc.c:257] using unprivileged user 65534:65534
I0228 09:13:49.696847 1121 nvc.c:191] loading kernel module nvidia
E0228 09:13:50.186352 1121 nvc.c:193] could not load kernel module nvidia
I0228 09:13:50.186425 1121 nvc.c:203] loading kernel module nvidia_uvm
E0228 09:13:50.628481 1121 nvc.c:205] could not load kernel module nvidia_uvm
I0228 09:13:50.628508 1121 nvc.c:211] loading kernel module nvidia_modeset
E0228 09:13:51.064044 1121 nvc.c:213] could not load kernel module nvidia_modeset
I0228 09:13:51.064251 1129 driver.c:133] starting driver service
I0228 09:13:51.066557 1120 driver.c:233] driver service terminated with signal 15
nvidia-container-cli: initialization error: cuda error: unknown error
the output of my attempt to run
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
is as follows
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/defdc438de52aef6ec0266539ea834320a9580f75bac6b71cfd2d2e3c999aae9/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: unknown.
ERRO[0000] error waiting for container: context canceled
any idea?
@soheilade have you solved the problem?
yeah, try reinstalling nvidia driver from here and run this docker command to launch carla server in a docker container
docker run -p 2000-2002:2000-2002 --rm -d -it -e NVIDIA_VISIBLE_DEVICES=0 --runtime nvidia carlasim/carla:0.9.5 ./CarlaUE4.sh /Game/Maps/Town01
This point to an error with the driver. Can you install the CUDA samples on the host machine and try to run for example deviceQuery?
@RenaudWasTaken I think I have insalled the driver suceessful, I can use tensorfow1.14.0 in the host machine. and I run commands as follows: 1.cat /proc/driver/nvidia/version, shows:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.33.01 Wed Nov 13 00:00:22 UTC 2019 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)
2.sudo dpkg --list | grep nvidia-*,shows:
iU libnvidia-container-tools 1.0.7-1 amd64 NVIDIA container runtime library (command-line tools) iU libnvidia-container1:amd64 1.0.7-1 amd64 NVIDIA container runtime library
- run the deviceQuery ,shows:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 4 CUDA Capable device(s)
Device 0: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 5 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 6 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 2: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 133 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 3: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 134 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU2) : No Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU3) : No Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU2) : No Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU3) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU0) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU1) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU3) : Yes Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU0) : No Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU1) : No Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU2) : Yes
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 9.0, NumDevs = 4 Result = PASS
what's wrong with these information? I do not find something.
Ok, nothing wrong with CUDA, the other two that might help are:
- vectoraddDrv
- nvidia-bug-report.sh
- in fact, I do not know how to use vectoraddDrv. cd /usr/local/cuda/samples/0_Simple/vectorAddDrv, then sudo make, generate vectorAddDrv*
- sudo nvidia-bug-report.sh, generate nvidia-bug-report.log.gz, some errors as follows: ff:15.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers [8086:2fb6] (rev 02)
I think this is not critical error, and what information should I look at ?
@RenaudWasTaken The problem has not been solved for me, can you give me futher help?
same error occur to me~~~heh
Same problem I am facing as well
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
tx2-01:~$ uname -a
Linux jetson-tx2-01 4.9.140-tegra #1 SMP PREEMPT Mon Aug 12 21:29:52 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux
tx2-01:~$ sudo nvidia-container-cli -k -d /dev/tty info [sudo] password for civilmaps:
-- WARNING, the following logs are for debugging purposes only --
I0609 06:28:32.004669 8657 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db)
I0609 06:28:32.004901 8657 nvc.c:255] using root /
I0609 06:28:32.004930 8657 nvc.c:256] using ldcache /etc/ld.so.cache
I0609 06:28:32.004947 8657 nvc.c:257] using unprivileged user 65534:65534
W0609 06:28:32.005415 8657 nvc.c:171] failed to detect NVIDIA devices
I0609 06:28:32.005723 8658 nvc.c:191] loading kernel module nvidia
E0609 06:28:32.006013 8658 nvc.c:193] could not load kernel module nvidia
I0609 06:28:32.006037 8658 nvc.c:203] loading kernel module nvidia_uvm
E0609 06:28:32.006142 8658 nvc.c:205] could not load kernel module nvidia_uvm
I0609 06:28:32.006161 8658 nvc.c:211] loading kernel module nvidia_modeset
E0609 06:28:32.006259 8658 nvc.c:213] could not load kernel module nvidia_modeset
I0609 06:28:32.007119 8659 driver.c:101] starting driver service
E0609 06:28:32.009737 8659 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0609 06:28:32.010706 8657 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
tx2-01:~$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
Same problem I am facing as well
Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionictx2-01:~$ uname -a
Linux jetson-tx2-01 4.9.140-tegra #1 SMP PREEMPT Mon Aug 12 21:29:52 PDT 2019 aarch64 aarch64 aarch64 GNU/Linuxtx2-01:~$ sudo nvidia-container-cli -k -d /dev/tty info [sudo] password for civilmaps:
-- WARNING, the following logs are for debugging purposes only -- I0609 06:28:32.004669 8657 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db) I0609 06:28:32.004901 8657 nvc.c:255] using root / I0609 06:28:32.004930 8657 nvc.c:256] using ldcache /etc/ld.so.cache I0609 06:28:32.004947 8657 nvc.c:257] using unprivileged user 65534:65534 W0609 06:28:32.005415 8657 nvc.c:171] failed to detect NVIDIA devices I0609 06:28:32.005723 8658 nvc.c:191] loading kernel module nvidia E0609 06:28:32.006013 8658 nvc.c:193] could not load kernel module nvidia I0609 06:28:32.006037 8658 nvc.c:203] loading kernel module nvidia_uvm E0609 06:28:32.006142 8658 nvc.c:205] could not load kernel module nvidia_uvm I0609 06:28:32.006161 8658 nvc.c:211] loading kernel module nvidia_modeset E0609 06:28:32.006259 8658 nvc.c:213] could not load kernel module nvidia_modeset I0609 06:28:32.007119 8659 driver.c:101] starting driver service E0609 06:28:32.009737 8659 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory I0609 06:28:32.010706 8657 driver.c:196] driver service terminated successfully nvidia-container-cli: initialization error: driver error: failed to process requesttx2-01:~$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown. ERRO[0000] error waiting for container: context canceled
@harendracmaps Have you solved your issue? I'm having the same exact error except I'm running it on an NVIDIA Xavier AGX
Running on the following specs:
- Ubuntu 18.04.3 LTS (Bionic Beaver)
- Docker version 19.03.12
- NVIDIA Docker version 2.0.3
- Jetpack version 4.3 (ARM64)
- CUDA version 10.0.326
nvidia@x02:~$ uname -a
Linux x02 4.9.140-tegra #1 SMP PREEMPT Mon Dec 9 22:52:02 PST 2019 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019
nvidia@x02:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Mon_Mar_11_22:13:24_CDT_2019
Cuda compilation tools, release 10.0, V10.0.326
nvidia@x02:~$ dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-================================-=====================-=====================-======================================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-container-tools 1.2.0-1 arm64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container0:arm64 0.9.0~beta.1 arm64 NVIDIA container runtime library
ii libnvidia-container1:arm64 1.2.0-1 arm64 NVIDIA container runtime library
un nvidia-304 <none> <none> (no description available)
un nvidia-340 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
ii nvidia-container-csv-cuda 10.0.326-1 arm64 Jetpack CUDA CSV file
ii nvidia-container-csv-cudnn 7.6.3.28-1+cuda10.0 arm64 Jetpack CUDNN CSV file
ii nvidia-container-csv-tensorrt 6.0.1.10-1+cuda10.0 arm64 Jetpack TensorRT CSV file
ii nvidia-container-csv-visionworks 1.6.0.500n arm64 Jetpack VisionWorks CSV file
ii nvidia-container-runtime 3.1.0-1 arm64 NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.2.1-1 arm64 NVIDIA container runtime hook
un nvidia-cuda-dev <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.2.0-1 all nvidia-docker CLI wrapper
ii nvidia-l4t-3d-core 32.3.1-20191209230245 arm64 NVIDIA GL EGL Package
ii nvidia-l4t-apt-source 32.3.1-20191209230245 arm64 NVIDIA L4T apt source list debian package
ii nvidia-l4t-bootloader 32.3.1-20191209230245 arm64 NVIDIA Bootloader Package
ii nvidia-l4t-camera 32.3.1-20191209230245 arm64 NVIDIA Camera Package
ii nvidia-l4t-ccp-t186ref 32.3.1-20191209230245 arm64 NVIDIA Compatibility Checking Package
ii nvidia-l4t-configs 32.3.1-20191209230245 arm64 NVIDIA configs debian package
ii nvidia-l4t-core 32.3.1-20191209230245 arm64 NVIDIA Core Package
ii nvidia-l4t-cuda 32.3.1-20191209230245 arm64 NVIDIA CUDA Package
ii nvidia-l4t-firmware 32.3.1-20191209230245 arm64 NVIDIA Firmware Package
ii nvidia-l4t-graphics-demos 32.3.1-20191209230245 arm64 NVIDIA graphics demo applications
ii nvidia-l4t-gstreamer 32.3.1-20191209230245 arm64 NVIDIA GST Application files
ii nvidia-l4t-init 32.3.1-20191209230245 arm64 NVIDIA Init debian package
ii nvidia-l4t-initrd 32.3.1-20191209230245 arm64 NVIDIA initrd debian package
ii nvidia-l4t-jetson-io 32.3.1-20191209230245 arm64 NVIDIA Jetson.IO debian package
ii nvidia-l4t-jetson-multimedia-api 32.3.1-20191209230245 arm64 NVIDIA Jetson Multimedia API is a collection of lower-level APIs that
ii nvidia-l4t-kernel 4.9.140-tegra-32.3.1- arm64 NVIDIA Kernel Package
ii nvidia-l4t-kernel-dtbs 4.9.140-tegra-32.3.1- arm64 NVIDIA Kernel DTB Package
ii nvidia-l4t-kernel-headers 4.9.140-tegra-32.3.1- arm64 NVIDIA Linux Tegra Kernel Headers Package
ii nvidia-l4t-multimedia 32.3.1-20191209230245 arm64 NVIDIA Multimedia Package
ii nvidia-l4t-multimedia-utils 32.3.1-20191209230245 arm64 NVIDIA Multimedia Package
ii nvidia-l4t-oem-config 32.3.1-20191209230245 arm64 NVIDIA OEM-Config Package
ii nvidia-l4t-tools 32.3.1-20191209230245 arm64 NVIDIA Public Test Tools Package
ii nvidia-l4t-wayland 32.3.1-20191209230245 arm64 NVIDIA Wayland Package
ii nvidia-l4t-weston 32.3.1-20191209230245 arm64 NVIDIA Weston Package
ii nvidia-l4t-x11 32.3.1-20191209230245 arm64 NVIDIA X11 Package
ii nvidia-l4t-xusb-firmware 32.3.1-20191209230245 arm64 NVIDIA USB Firmware Package
un nvidia-libopencl1-dev <none> <none> (no description available)
un nvidia-prime <none> <none> (no description available)
if you get this error on a Jetson board:
could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Then it means you've installed nvidia-container-toolkit from the official repos (https://nvidia.github.io/nvidia-docker).
nvidia-container-toolkit does not support Jetson right now, but there is a beta version in the jetpack repos that does.
Remove the nvidia-docker repo, then reinstall nvidia-container-runtime and nvidia-jetpack.
if you get this error on a Jetson board:
could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directoryThen it means you've installed nvidia-container-toolkit from the official repos (
https://nvidia.github.io/nvidia-docker). nvidia-container-toolkit does not support Jetson right now, but there is a beta version in the jetpack repos that does. Remove the nvidia-docker repo, then reinstall nvidia-container-runtime and nvidia-jetpack.
Thank you!!
upgrade nvida-docker version to nvidia-docker2-2.5.0,the problem is solved perfectly.
CUDA Version: 11.0 docker-ce: 19.03.7 nvidia-docker2-2.5.0-1
@mildsunrise It seems nvidia-docker supports jetson but I am still getting this error even with nvidia-docker2-2.5.0-1
what makes you think nvidia-docker supports jetson? the FAQ still says you need the SDK manager (aka the jetson repos). you need the version of nvidia-docker2 that comes with the jetson repos, not the nvidia-docker one
@mildsunrise ah you mean "jetpack" by "jetson repos" don't you? So that might very well be my issue. I am using the stock kernel ConnectTech provides and presumed because it had L4T 32.4.4 installed that it had Jetpack 4.2.2 installed, but I think I need to reflash it because the manufacturer probably does just a minimal install for QA purposes.
https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson
I am getting the same error with nvidia-container-toolkit/bionic,now 1.5.1-1 amd64 under Ubuntu Server 20.04 LTS, running headless.
I installed the nvidia drivers via the .run file, downloaded from the official nvidia page and nvidia-smi is working, as is the hashcat benchmark.
However, when I run docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi with docker-ce/focal,now 5:20.10.7~3-0~ubuntu-focal amd64, I get the aforementioned error docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
I already tried newer drivers from the repositories up until version 470, but nothing worked.
Any ideas?
It seems that NVidia continues ignoring Linux support
@AlexAshs sorry for the delay in getting back to you. Would you mind creating a new ticket and including the debug output from /var/log/nvidia-container-toolkit.log? This logging can be enabled by uncommenting the #debug= line in the nvidia-container-cli section of the /etc/nvidia-container-runtime/config.toml file.
The reason I ask for a new issue is that this one has gotten quite long and seems to contain a mix of issues related to Jetson platforms and others that have been marked as fixed.
@elezar No worrys, I was just trying things out, since this is my first dedicated GPU, so nothing in production just yet :D I have posted my issue here Containers with gpus not starting up. I really don't post issues often, I prefer finding solutoins first, so if there is something missing or the title sucks, just let me know, so I can provide what is necessary to tackle this.
I have the exact same problem.
Configuration: Host: Windows 10 with WSL2, with CUDA installed.
Error: docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Command: docker run --gpus all --cpus 2 --name test -it pytorch/pytorch
Hardware: NVidia GeForce GTX 1660 TI
Any solution to this program?
Are you installing updates from the windows insider dev channel? It seems to be a requirement for this setup to work.
@AlexAshs I am not signed for insiders program. May you tell me which update is necessary? I will install it manually
@TheMarshalMole I found this guide, that should make things easier for you: https://www.forecr.io/blogs/installation/nvidia-docker-installation-for-ubuntu-in-wsl-2
I installed my GPU driver from Software and update >> additional drivers and It solved my problem