nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown. ERRO[0001] error waiting for container: context canceled

Open avinaash67 opened this issue 3 years ago • 7 comments

Error similar to this issue #1393

I am working on a device called ZF-ProAI that uses Nvidia-Xavier-SOC, CPU 8 Cores @ 2.1 GHz, GPU Volta, 4TPC with Linux tegra-ubuntu 4.14.78-rt44-tegra OS installed in it. This hardware is sold with this preinstalled OS and with CUDA-10 .1 for AI development. A standalone python application for “object detection works fine” on this hardware

Now I want to containerize a 'sample python application' using docker using the base image from NGC catalog to access the GPU

Sample python application

import torch
  import time
  while(1):
      print("gpu usage =",torch.cuda.is_available()) #  Prints true if GPU is available
      time.sleep(1)

Dockerfile

FROM nvcr.io/nvidia/l4t-base:r32.6.1 # Base Image

All packages seems to be installed as explained in https://www.forecr.io/blogs/programming/nvidia-container-runtime-1-installation

nvidia@tegra-ubuntu:~/Desktop/test_env/docker_local_test/nvidia-docker$ sudo dpkg --get-selections *nvidia*
libnvidia-cfg1-465:arm64      install
libnvidia-common-465        install
libnvidia-compute-465:arm64     install
libnvidia-container-tools     install
libnvidia-container0:arm64      install        # present
libnvidia-container1:arm64      install
libnvidia-decode-465:arm64      install
libnvidia-encode-465:arm64      install
libnvidia-extra-465:arm64     install
libnvidia-fbc1-465:arm64      install
libnvidia-gl-465:arm64        install
libnvidia-ifr1-465:arm64      install
nvidia-compute-utils-465      install
nvidia-container-runtime      install            # present
nvidia-container-toolkit      install              # present
nvidia-dkms-465         install
nvidia-docker2          install                         # present
nvidia-driver-465       install
nvidia-kernel-common-465      install
nvidia-kernel-source-465      install
nvidia-modprobe         install
nvidia-prime          install
nvidia-settings         install
nvidia-utils-465        install
xserver-xorg-video-nvidia-465     install

Nvidia driver seems to be installed already,

nvidia@tegra-ubuntu:~$ sudo apt install nvidia-driver-465
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-driver-465 is already the newest version (465.19.01-0ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.

After building the docker image if I run the docker container I get the below shown error,

nvidia@tegra-ubuntu:~$ docker run -it --runtime nvidia l4t-nvcrio
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0001] error waiting for container: context canceled

Can someone help me with this issue? Please let me know if you need further information. Thanks in Advance.

avinaash67 avatar Feb 03 '22 12:02 avinaash67

@avinaash67 is Jetpack installed on this platform?

elezar avatar Feb 03 '22 12:02 elezar

@elezar Thanks for the reply. I am not fully aware on how this SOC was flashed on the ZF-ProAI. I had tried the following command as shown in this link https://forums.developer.nvidia.com/t/how-to-check-the-jetpack-version/69549.

nvidia@tegra-ubuntu:$ dpkg -l | grep -i 'jetpack'
ii  libnvidia-container0:arm64                 0.10.0+jetpack                             arm64        NVIDIA container runtime library

avinaash67 avatar Feb 07 '22 09:02 avinaash67

Is there a file called /etc/nv_tegra_release on your system? The jetpack normally installs this file and we use it in libnvidia-container to detect that you are on a tegra device instead of a server machine. If it does not exist, then try just creating it (its contents do not matter), and see if that resolves your issue.

klueska avatar Feb 07 '22 10:02 klueska

@klueska /etc/nv_tegra_release is not present on the device. I created it by using command touch nv_tegra_release. Then I rerun the docker container, the same error persists.

nvidia@tegra-ubuntu:~$ docker run -it --runtime nvidia l4t-nvcrio
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0001] error waiting for container: context canceled

avinaash67 avatar Feb 07 '22 10:02 avinaash67

What version of the various toolkit components do you have installed?

i.e. running the following (the list you provided before did not include the versions of these components):

$ dpkg -l '*nvidia*'

You need to have at least these versions installed or newer:

ii  libnvidia-container-tools              1.7.0-1                           arm64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:arm64             1.7.0-1                           arm64        NVIDIA container runtime library
ii  nvidia-container-runtime               3.7.0-1                           all          NVIDIA container runtime
ii  nvidia-container-toolkit               1.7.0-1                           arm64        NVIDIA container runtime hook
ii  nvidia-docker2                         2.8.0-1                           all          nvidia-docker CLI wrapper

# AND 

ii  libnvidia-container0:arm64             0.10.0+jetpack                           arm64        NVIDIA container runtime library

# OR

ii  libnvidia-container0:arm64             0.9.0+beta.1                           arm64        NVIDIA container runtime library

klueska avatar Feb 07 '22 10:02 klueska

@klueska $ dpkg -l 'nvidia' in my hardware is shown below,

||/ Name                                          Version                     Architecture                Description
+++-=============================================-===========================-===========================-===============================================================================================
un  libgldispatch0-nvidia                         <none>                      <none>                      (no description available)
ii  libnvidia-cfg1-465:arm64                      465.19.01-0ubuntu1          arm64                       NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                            <none>                      <none>                      (no description available)
un  libnvidia-common                              <none>                      <none>                      (no description available)
ii  libnvidia-common-465                          465.19.01-0ubuntu1          all                         Shared files used by the NVIDIA libraries
ii  libnvidia-compute-465:arm64                   465.19.01-0ubuntu1          arm64                       NVIDIA libcompute package
ii  libnvidia-container-tools                     1.7.0-1                     arm64                       NVIDIA container runtime library (command-line tools)
ii  libnvidia-container0:arm64                    0.10.0+jetpack              arm64                       NVIDIA container runtime library
ii  libnvidia-container1:arm64                    1.7.0-1                     arm64                       NVIDIA container runtime library
un  libnvidia-decode                              <none>                      <none>                      (no description available)
ii  libnvidia-decode-465:arm64                    465.19.01-0ubuntu1          arm64                       NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                              <none>                      <none>                      (no description available)
ii  libnvidia-encode-465:arm64                    465.19.01-0ubuntu1          arm64                       NVENC Video Encoding runtime library
un  libnvidia-extra                               <none>                      <none>                      (no description available)
ii  libnvidia-extra-465:arm64                     465.19.01-0ubuntu1          arm64                       Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                                <none>                      <none>                      (no description available)
ii  libnvidia-fbc1-465:arm64                      465.19.01-0ubuntu1          arm64                       NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                                  <none>                      <none>                      (no description available)
ii  libnvidia-gl-465:arm64                        465.19.01-0ubuntu1          arm64                       NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                                <none>                      <none>                      (no description available)
ii  libnvidia-ifr1-465:arm64                      465.19.01-0ubuntu1          arm64                       NVIDIA OpenGL-based Inband Frame Readback runtime library
un  libnvidia-ml1                                 <none>                      <none>                      (no description available)
un  nvidia-304                                    <none>                      <none>                      (no description available)
un  nvidia-340                                    <none>                      <none>                      (no description available)
un  nvidia-384                                    <none>                      <none>                      (no description available)
un  nvidia-390                                    <none>                      <none>                      (no description available)
un  nvidia-common                                 <none>                      <none>                      (no description available)
ii  nvidia-compute-utils-465                      465.19.01-0ubuntu1          arm64                       NVIDIA compute utilities
ii  nvidia-container-runtime                      3.6.0-1                     all                         NVIDIA container runtime
un  nvidia-container-runtime-hook                 <none>                      <none>                      (no description available)
ii  nvidia-container-toolkit                      1.7.0-1                     arm64                       NVIDIA container runtime hook
ii  nvidia-dkms-465                               465.19.01-0ubuntu1          arm64                       NVIDIA DKMS package
un  nvidia-dkms-kernel                            <none>                      <none>                      (no description available)
un  nvidia-docker                                 <none>                      <none>                      (no description available)
ii  nvidia-docker2                                2.8.0-1                     all                         nvidia-docker CLI wrapper
ii  nvidia-driver-465                             465.19.01-0ubuntu1          arm64                       NVIDIA driver metapackage
un  nvidia-driver-binary                          <none>                      <none>                      (no description available)
un  nvidia-kernel-common                          <none>                      <none>                      (no description available)
ii  nvidia-kernel-common-465                      465.19.01-0ubuntu1          arm64                       Shared files used with the kernel module
un  nvidia-kernel-source                          <none>                      <none>                      (no description available)
ii  nvidia-kernel-source-465                      465.19.01-0ubuntu1          arm64                       NVIDIA kernel source package
un  nvidia-libopencl1-dev                         <none>                      <none>                      (no description available)
ii  nvidia-modprobe                               465.19.01-0ubuntu1          arm64                       Load the NVIDIA kernel driver and create device files
un  nvidia-opencl-icd                             <none>                      <none>                      (no description available)
un  nvidia-persistenced                           <none>                      <none>                      (no description available)
ii  nvidia-prime                                  0.8.16~0.18.04.1            all                         Tools to enable NVIDIA's Prime
ii  nvidia-settings                               465.19.01-0ubuntu1          arm64                       Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                        <none>                      <none>                      (no description available)
un  nvidia-smi                                    <none>                      <none>                      (no description available)
un  nvidia-utils                                  <none>                      <none>                      (no description available)
ii  nvidia-utils-465                              465.19.01-0ubuntu1          arm64                       NVIDIA driver support binaries
ii  xserver-xorg-video-nvidia-465                 465.19.01-0ubuntu1          arm64                       NVIDIA binary Xorg driver

The versions of the packages seems to be correct.

avinaash67 avatar Feb 07 '22 11:02 avinaash67

Hmm. Something strange is going on then.

So long as you have the /etc/nv_tegra_release file in place, there should be no path for it to issue an nvml error as you are seeing in your error string. There is no nvml on tegra, so a completely different path is taken to initialize libnvidia-container and you would never see an error string with nvml in it.

klueska avatar Feb 07 '22 15:02 klueska