docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown. ERRO[0001] error waiting for container: context canceled
Error similar to this issue #1393
I am working on a device called ZF-ProAI that uses Nvidia-Xavier-SOC, CPU 8 Cores @ 2.1 GHz, GPU Volta, 4TPC with Linux tegra-ubuntu 4.14.78-rt44-tegra OS installed in it. This hardware is sold with this preinstalled OS and with CUDA-10 .1 for AI development. A standalone python application for “object detection works fine” on this hardware
Now I want to containerize a 'sample python application' using docker using the base image from NGC catalog to access the GPU
Sample python application
import torch
import time
while(1):
print("gpu usage =",torch.cuda.is_available()) # Prints true if GPU is available
time.sleep(1)
Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.6.1 # Base Image
All packages seems to be installed as explained in https://www.forecr.io/blogs/programming/nvidia-container-runtime-1-installation
nvidia@tegra-ubuntu:~/Desktop/test_env/docker_local_test/nvidia-docker$ sudo dpkg --get-selections *nvidia*
libnvidia-cfg1-465:arm64 install
libnvidia-common-465 install
libnvidia-compute-465:arm64 install
libnvidia-container-tools install
libnvidia-container0:arm64 install # present
libnvidia-container1:arm64 install
libnvidia-decode-465:arm64 install
libnvidia-encode-465:arm64 install
libnvidia-extra-465:arm64 install
libnvidia-fbc1-465:arm64 install
libnvidia-gl-465:arm64 install
libnvidia-ifr1-465:arm64 install
nvidia-compute-utils-465 install
nvidia-container-runtime install # present
nvidia-container-toolkit install # present
nvidia-dkms-465 install
nvidia-docker2 install # present
nvidia-driver-465 install
nvidia-kernel-common-465 install
nvidia-kernel-source-465 install
nvidia-modprobe install
nvidia-prime install
nvidia-settings install
nvidia-utils-465 install
xserver-xorg-video-nvidia-465 install
Nvidia driver seems to be installed already,
nvidia@tegra-ubuntu:~$ sudo apt install nvidia-driver-465
Reading package lists... Done
Building dependency tree
Reading state information... Done
nvidia-driver-465 is already the newest version (465.19.01-0ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
After building the docker image if I run the docker container I get the below shown error,
nvidia@tegra-ubuntu:~$ docker run -it --runtime nvidia l4t-nvcrio
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0001] error waiting for container: context canceled
Can someone help me with this issue? Please let me know if you need further information. Thanks in Advance.
@avinaash67 is Jetpack installed on this platform?
@elezar Thanks for the reply. I am not fully aware on how this SOC was flashed on the ZF-ProAI. I had tried the following command as shown in this link https://forums.developer.nvidia.com/t/how-to-check-the-jetpack-version/69549.
nvidia@tegra-ubuntu:$ dpkg -l | grep -i 'jetpack'
ii libnvidia-container0:arm64 0.10.0+jetpack arm64 NVIDIA container runtime library
Is there a file called /etc/nv_tegra_release on your system? The jetpack normally installs this file and we use it in libnvidia-container to detect that you are on a tegra device instead of a server machine. If it does not exist, then try just creating it (its contents do not matter), and see if that resolves your issue.
@klueska
/etc/nv_tegra_release is not present on the device. I created it by using command touch nv_tegra_release. Then I rerun the docker container, the same error persists.
nvidia@tegra-ubuntu:~$ docker run -it --runtime nvidia l4t-nvcrio
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0001] error waiting for container: context canceled
What version of the various toolkit components do you have installed?
i.e. running the following (the list you provided before did not include the versions of these components):
$ dpkg -l '*nvidia*'
You need to have at least these versions installed or newer:
ii libnvidia-container-tools 1.7.0-1 arm64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:arm64 1.7.0-1 arm64 NVIDIA container runtime library
ii nvidia-container-runtime 3.7.0-1 all NVIDIA container runtime
ii nvidia-container-toolkit 1.7.0-1 arm64 NVIDIA container runtime hook
ii nvidia-docker2 2.8.0-1 all nvidia-docker CLI wrapper
# AND
ii libnvidia-container0:arm64 0.10.0+jetpack arm64 NVIDIA container runtime library
# OR
ii libnvidia-container0:arm64 0.9.0+beta.1 arm64 NVIDIA container runtime library
@klueska $ dpkg -l 'nvidia' in my hardware is shown below,
||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-cfg1-465:arm64 465.19.01-0ubuntu1 arm64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
un libnvidia-common <none> <none> (no description available)
ii libnvidia-common-465 465.19.01-0ubuntu1 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-465:arm64 465.19.01-0ubuntu1 arm64 NVIDIA libcompute package
ii libnvidia-container-tools 1.7.0-1 arm64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container0:arm64 0.10.0+jetpack arm64 NVIDIA container runtime library
ii libnvidia-container1:arm64 1.7.0-1 arm64 NVIDIA container runtime library
un libnvidia-decode <none> <none> (no description available)
ii libnvidia-decode-465:arm64 465.19.01-0ubuntu1 arm64 NVIDIA Video Decoding runtime libraries
un libnvidia-encode <none> <none> (no description available)
ii libnvidia-encode-465:arm64 465.19.01-0ubuntu1 arm64 NVENC Video Encoding runtime library
un libnvidia-extra <none> <none> (no description available)
ii libnvidia-extra-465:arm64 465.19.01-0ubuntu1 arm64 Extra libraries for the NVIDIA driver
un libnvidia-fbc1 <none> <none> (no description available)
ii libnvidia-fbc1-465:arm64 465.19.01-0ubuntu1 arm64 NVIDIA OpenGL-based Framebuffer Capture runtime library
un libnvidia-gl <none> <none> (no description available)
ii libnvidia-gl-465:arm64 465.19.01-0ubuntu1 arm64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un libnvidia-ifr1 <none> <none> (no description available)
ii libnvidia-ifr1-465:arm64 465.19.01-0ubuntu1 arm64 NVIDIA OpenGL-based Inband Frame Readback runtime library
un libnvidia-ml1 <none> <none> (no description available)
un nvidia-304 <none> <none> (no description available)
un nvidia-340 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-390 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
ii nvidia-compute-utils-465 465.19.01-0ubuntu1 arm64 NVIDIA compute utilities
ii nvidia-container-runtime 3.6.0-1 all NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.7.0-1 arm64 NVIDIA container runtime hook
ii nvidia-dkms-465 465.19.01-0ubuntu1 arm64 NVIDIA DKMS package
un nvidia-dkms-kernel <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.8.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver-465 465.19.01-0ubuntu1 arm64 NVIDIA driver metapackage
un nvidia-driver-binary <none> <none> (no description available)
un nvidia-kernel-common <none> <none> (no description available)
ii nvidia-kernel-common-465 465.19.01-0ubuntu1 arm64 Shared files used with the kernel module
un nvidia-kernel-source <none> <none> (no description available)
ii nvidia-kernel-source-465 465.19.01-0ubuntu1 arm64 NVIDIA kernel source package
un nvidia-libopencl1-dev <none> <none> (no description available)
ii nvidia-modprobe 465.19.01-0ubuntu1 arm64 Load the NVIDIA kernel driver and create device files
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-persistenced <none> <none> (no description available)
ii nvidia-prime 0.8.16~0.18.04.1 all Tools to enable NVIDIA's Prime
ii nvidia-settings 465.19.01-0ubuntu1 arm64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-utils <none> <none> (no description available)
ii nvidia-utils-465 465.19.01-0ubuntu1 arm64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-465 465.19.01-0ubuntu1 arm64 NVIDIA binary Xorg driver
The versions of the packages seems to be correct.
Hmm. Something strange is going on then.
So long as you have the /etc/nv_tegra_release file in place, there should be no path for it to issue an nvml error as you are seeing in your error string. There is no nvml on tegra, so a completely different path is taken to initialize libnvidia-container and you would never see an error string with nvml in it.