nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

nvidia-container-cli: initialization error on Ubuntu22.04LTS

Open Lonitch opened this issue 3 years ago • 7 comments

Hi there,

I recently wanted to build containers that can run GUI applications. My Dockerfile and docker-compose.yml work well in WSL2, but I ran into problems when building the same container in Ubuntu 22.04LTS. My Dockerfile looks like the following:

FROM osrf/ros:melodic-desktop-full

SHELL ["/bin/bash", "-c"]

# Minimal setup
RUN echo "source /opt/ros/melodic/setup.bash" >> ~/.bashrc
RUN source ~/.bashrc
# Extra pkg installation after this!

And docker-compose.yml looks like

services:
  melodic:
    build: .
    image: melodic
    command: roslaunch gazebo_ros empty_world.launch &&
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]
    environment:
      - DISPLAY=${DISPLAY}
      - NVIDIA_DRIVER_CAPABILITIES=all
      - NVIDIA_VISIBLE_DEVICES=all
      - QT_X11_NO_MITSHM=1
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - ${PWD}/.Xauthority:/root/.Xauthority:rw
    network_mode: "host"

When I run docker compose up, the following error pops up:

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

--------Steps I've taken so far-------- My theory was something wrong with the Nvidia runtime, so I added

runtime: nvidia

before command in the docker-compose.yml above, and when docker compose up again, I have the following error

Error response from daemon: Unknown runtime specified nvidia

Next, I followed the steps listed here to add the runtime using

sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

which shows the following outputs:

INFO[2022-07-11T10:30:18.583217896-05:00] Starting up                                  
INFO[2022-07-11T10:30:18.584301607-05:00] detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf 
INFO[2022-07-11T10:30:18.585538257-05:00] parsed scheme: "unix"                         module=grpc
INFO[2022-07-11T10:30:18.585571148-05:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2022-07-11T10:30:18.585618515-05:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}  module=grpc
INFO[2022-07-11T10:30:18.585635177-05:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2022-07-11T10:30:18.586921837-05:00] parsed scheme: "unix"                         module=grpc
INFO[2022-07-11T10:30:18.586945960-05:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2022-07-11T10:30:18.586973034-05:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}  module=grpc
INFO[2022-07-11T10:30:18.586984481-05:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2022-07-11T10:30:18.595208030-05:00] [graphdriver] using prior storage driver: overlay2 
failed to start daemon: error while opening volume store metadata database: timeout

I try to add the runtime using systemd drop-in file, but the error persists even after I reboot the machine.

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
I0711 15:35:59.889166 508754 nvc.c:376] initializing library context (version=1.10.0, build=395fd41701117121f1fd04ada01e1d7e006a37ae)
I0711 15:35:59.889215 508754 nvc.c:350] using root /
I0711 15:35:59.889219 508754 nvc.c:351] using ldcache /etc/ld.so.cache
I0711 15:35:59.889222 508754 nvc.c:352] using unprivileged user 1000:1000
I0711 15:35:59.889243 508754 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0711 15:35:59.889443 508754 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0711 15:35:59.889929 508754 nvc.c:258] failed to detect NVIDIA devices
W0711 15:35:59.890282 508755 nvc.c:273] failed to set inheritable capabilities
W0711 15:35:59.890372 508755 nvc.c:274] skipping kernel modules load due to failure
I0711 15:35:59.890815 508756 rpc.c:71] starting driver rpc service
I0711 15:35:59.902386 508757 rpc.c:71] starting nvcgo rpc service
I0711 15:35:59.906779 508754 nvc_info.c:766] requesting driver information with ''
I0711 15:35:59.908232 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.515.48.07
I0711 15:35:59.908279 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.48.07
I0711 15:35:59.908509 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.48.07
I0711 15:35:59.908752 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.515.48.07
I0711 15:35:59.908945 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.515.48.07
I0711 15:35:59.909177 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.515.48.07
I0711 15:35:59.909421 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.48.07
I0711 15:35:59.909449 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.515.48.07
I0711 15:35:59.909680 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.48.07
I0711 15:35:59.909709 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.48.07
I0711 15:35:59.909736 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.48.07
I0711 15:35:59.909963 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.515.48.07
I0711 15:35:59.910180 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.515.48.07
I0711 15:35:59.910220 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.48.07
I0711 15:35:59.910443 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.515.48.07
I0711 15:35:59.910478 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.515.48.07
I0711 15:35:59.910721 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.515.48.07
I0711 15:35:59.910969 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.515.48.07
I0711 15:35:59.911125 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.515.48.07
I0711 15:35:59.911228 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.515.48.07
I0711 15:35:59.911446 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.515.48.07
I0711 15:35:59.911613 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.515.48.07
I0711 15:35:59.911643 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.515.48.07
I0711 15:35:59.911793 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.515.48.07
I0711 15:35:59.912020 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.515.48.07
I0711 15:35:59.912203 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.515.48.07
I0711 15:35:59.912440 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.515.48.07
I0711 15:35:59.912658 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.515.48.07
I0711 15:35:59.912889 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.515.48.07
I0711 15:35:59.913119 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.515.48.07
I0711 15:35:59.913340 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.515.48.07
I0711 15:35:59.913560 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.515.48.07
I0711 15:35:59.913798 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.515.48.07
I0711 15:35:59.914030 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.515.48.07
I0711 15:35:59.914254 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.515.48.07
I0711 15:35:59.914481 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.515.48.07
I0711 15:35:59.914726 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libcuda.so.515.48.07
I0711 15:35:59.914974 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.515.48.07
I0711 15:35:59.915200 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.515.48.07
I0711 15:35:59.915387 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.515.48.07
I0711 15:35:59.915626 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.515.48.07
W0711 15:35:59.915641 508754 nvc_info.c:399] missing library libnvidia-nscq.so
W0711 15:35:59.915646 508754 nvc_info.c:399] missing library libcudadebugger.so
W0711 15:35:59.915650 508754 nvc_info.c:399] missing library libnvidia-fatbinaryloader.so
W0711 15:35:59.915655 508754 nvc_info.c:399] missing library libnvidia-pkcs11.so
W0711 15:35:59.915662 508754 nvc_info.c:399] missing library libvdpau_nvidia.so
W0711 15:35:59.915666 508754 nvc_info.c:399] missing library libnvidia-ifr.so
W0711 15:35:59.915671 508754 nvc_info.c:399] missing library libnvidia-cbl.so
W0711 15:35:59.915676 508754 nvc_info.c:403] missing compat32 library libnvidia-cfg.so
W0711 15:35:59.915680 508754 nvc_info.c:403] missing compat32 library libnvidia-nscq.so
W0711 15:35:59.915684 508754 nvc_info.c:403] missing compat32 library libcudadebugger.so
W0711 15:35:59.915690 508754 nvc_info.c:403] missing compat32 library libnvidia-fatbinaryloader.so
W0711 15:35:59.915693 508754 nvc_info.c:403] missing compat32 library libnvidia-allocator.so
W0711 15:35:59.915699 508754 nvc_info.c:403] missing compat32 library libnvidia-pkcs11.so
W0711 15:35:59.915706 508754 nvc_info.c:403] missing compat32 library libnvidia-ngx.so
W0711 15:35:59.915709 508754 nvc_info.c:403] missing compat32 library libvdpau_nvidia.so
W0711 15:35:59.915718 508754 nvc_info.c:403] missing compat32 library libnvidia-ifr.so
W0711 15:35:59.915723 508754 nvc_info.c:403] missing compat32 library libnvidia-rtcore.so
W0711 15:35:59.915727 508754 nvc_info.c:403] missing compat32 library libnvoptix.so
W0711 15:35:59.915733 508754 nvc_info.c:403] missing compat32 library libnvidia-cbl.so
I0711 15:35:59.915936 508754 nvc_info.c:299] selecting /usr/bin/nvidia-smi
I0711 15:35:59.915958 508754 nvc_info.c:299] selecting /usr/bin/nvidia-debugdump
I0711 15:35:59.915970 508754 nvc_info.c:299] selecting /usr/bin/nvidia-persistenced
I0711 15:35:59.916000 508754 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-control
I0711 15:35:59.916017 508754 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-server
W0711 15:35:59.916079 508754 nvc_info.c:425] missing binary nv-fabricmanager
I0711 15:35:59.916534 508754 nvc_info.c:343] listing firmware path /usr/lib/firmware/nvidia/515.48.07/gsp.bin
I0711 15:35:59.916559 508754 nvc_info.c:529] listing device /dev/nvidiactl
I0711 15:35:59.916564 508754 nvc_info.c:529] listing device /dev/nvidia-uvm
I0711 15:35:59.916568 508754 nvc_info.c:529] listing device /dev/nvidia-uvm-tools
I0711 15:35:59.916571 508754 nvc_info.c:529] listing device /dev/nvidia-modeset
I0711 15:35:59.916597 508754 nvc_info.c:343] listing ipc path /run/nvidia-persistenced/socket
W0711 15:35:59.916620 508754 nvc_info.c:349] missing ipc path /var/run/nvidia-fabricmanager/socket
W0711 15:35:59.916638 508754 nvc_info.c:349] missing ipc path /tmp/nvidia-mps
I0711 15:35:59.916642 508754 nvc_info.c:822] requesting device information with ''
I0711 15:35:59.922969 508754 nvc_info.c:713] listing device /dev/nvidia0 (GPU-fae27ba8-419c-98fd-a0bf-2727d9f9b612 at 00000000:17:00.0)
I0711 15:35:59.928540 508754 nvc_info.c:713] listing device /dev/nvidia1 (GPU-96aa5232-2bc1-4326-b17e-a4b633788cc0 at 00000000:73:00.0)
NVRM version:   515.48.07
CUDA version:   11.7

Device Index:   0
Device Minor:   0
Model:          NVIDIA RTX A6000
Brand:          NvidiaRTX
GPU UUID:       GPU-fae27ba8-419c-98fd-a0bf-2727d9f9b612
Bus Location:   00000000:17:00.0
Architecture:   8.6

Device Index:   1
Device Minor:   1
Model:          NVIDIA RTX A6000
Brand:          NvidiaRTX
GPU UUID:       GPU-96aa5232-2bc1-4326-b17e-a4b633788cc0
Bus Location:   00000000:73:00.0
Architecture:   8.6
I0711 15:35:59.928579 508754 nvc.c:434] shutting down library context
I0711 15:35:59.928619 508757 rpc.c:95] terminating nvcgo rpc service
I0711 15:35:59.929120 508754 rpc.c:135] nvcgo rpc service terminated successfully
I0711 15:35:59.932333 508756 rpc.c:95] terminating driver rpc service
I0711 15:35:59.932434 508754 rpc.c:135] driver rpc service terminated successfully
  • [x] Driver information from nvidia-smi -a
==============NVSMI LOG==============

Timestamp                                 : Mon Jul 11 10:38:04 2022
Driver Version                            : 515.48.07
CUDA Version                              : 11.7

Attached GPUs                             : 2
GPU 00000000:17:00.0
    Product Name                          : NVIDIA RTX A6000
    Product Brand                         : NVIDIA RTX
    Product Architecture                  : Ampere
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1320922039617
    GPU UUID                              : GPU-fae27ba8-419c-98fd-a0bf-2727d9f9b612
    Minor Number                          : 0
    VBIOS Version                         : 94.02.5C.00.07
    MultiGPU Board                        : No
    Board ID                              : 0x1700
    GPU Part Number                       : 900-5G133-0100-001
    Module ID                             : 0
    Inforom Version
        Image Version                     : G133.0500.00.05
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x17
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x223010DE
        Bus Id                            : 00000000:17:00.0
        Sub System Id                     : 0x14591028
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 2
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 812000 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 30 %
    Performance State                     : P5
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 49140 MiB
        Reserved                          : 454 MiB
        Used                              : 504 MiB
        Free                              : 48180 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 6 MiB
        Free                              : 250 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Disabled
        Pending                           : Disabled
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 192 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 39 C
        GPU Shutdown Temp                 : 98 C
        GPU Slowdown Temp                 : 95 C
        GPU Max Operating Temp            : 93 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 22.24 W
        Power Limit                       : 300.00 W
        Default Power Limit               : 300.00 W
        Enforced Power Limit              : 300.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 300.00 W
    Clocks
        Graphics                          : 450 MHz
        SM                                : 450 MHz
        Memory                            : 810 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 8001 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 750.000 mV
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2243
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 198 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2590
            Type                          : G
            Name                          : /usr/libexec/gnome-remote-desktop-daemon
            Used GPU Memory               : 4 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2713
            Type                          : G
            Name                          : /usr/bin/gnome-shell
            Used GPU Memory               : 78 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3094990
            Type                          : G
            Name                          : /opt/docker-desktop/Docker Desktop --type=gpu-process --enable-crashpad --enable-crash-reporter=bb2e72bd-deee-4039-8f1a-387044ef5ff0,no_channel --user-data-dir=/home/fit/.config/Docker Desktop --gpu-preferences=UAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAABgAAAAAAAAAGAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAACAAAAAAAAAA= --shared-files --field-trial-handle=0,9431087310652734068,13867430110768028059,131072 --disable-features=PlzServiceWorker,SpareRendererForSitePerProcess
            Used GPU Memory               : 21 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3156761
            Type                          : G
            Name                          : /usr/share/code/code --type=gpu-process --disable-color-correct-rendering --enable-crashpad --crashpad-handler-pid=3156633 --enable-crash-reporter=3cb89e58-246f-4914-bfb9-c5be6ca52941,no_channel --user-data-dir=/home/fit/.config/Code --gpu-preferences=WAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAAAAAA4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAIAAAAAAAAAA== --shared-files --field-trial-handle=0,i,15765629358387694078,15697432999206173879,131072 --disable-features=SpareRendererForSitePerProcess
            Used GPU Memory               : 31 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3350821
            Type                          : G
            Name                          : /snap/firefox/1551/usr/lib/firefox/firefox
            Used GPU Memory               : 166 MiB

GPU 00000000:73:00.0
    Product Name                          : NVIDIA RTX A6000
    Product Brand                         : NVIDIA RTX
    Product Architecture                  : Ampere
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1320922039629
    GPU UUID                              : GPU-96aa5232-2bc1-4326-b17e-a4b633788cc0
    Minor Number                          : 1
    VBIOS Version                         : 94.02.5C.00.07
    MultiGPU Board                        : No
    Board ID                              : 0x7300
    GPU Part Number                       : 900-5G133-0100-001
    Module ID                             : 0
    Inforom Version
        Image Version                     : G133.0500.00.05
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x73
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x223010DE
        Bus Id                            : 00000000:73:00.0
        Sub System Id                     : 0x14591028
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 1576000 KB/s
    Fan Speed                             : 30 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 49140 MiB
        Reserved                          : 457 MiB
        Used                              : 106 MiB
        Free                              : 48576 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 5 MiB
        Free                              : 251 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 10 %
        Memory                            : 13 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Disabled
        Pending                           : Disabled
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 192 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 40 C
        GPU Shutdown Temp                 : 98 C
        GPU Slowdown Temp                 : 95 C
        GPU Max Operating Temp            : 93 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 25.95 W
        Power Limit                       : 300.00 W
        Default Power Limit               : 300.00 W
        Enforced Power Limit              : 300.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 300.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 8001 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 737.500 mV
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2243
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 105 MiB
  • [x] Docker version from docker version
Client: Docker Engine - Community
 Cloud integration: v1.0.24
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:46 2022
 OS/Arch:           linux/amd64
 Context:           desktop-linux
 Experimental:      true

Server: Docker Desktop 4.10.1 (82475)
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:23 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • [x] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
||/ Name                                       Version                    Architecture Description
+++-==========================================-==========================-============-====================================>
un  libgldispatch0-nvidia                      <none>                     <none>       (no description available)
ii  libnvidia-cfg1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA binary OpenGL/GLX configurati>
un  libnvidia-cfg1-any                         <none>                     <none>       (no description available)
un  libnvidia-common                           <none>                     <none>       (no description available)
ii  libnvidia-common-515                       515.48.07-0ubuntu0.22.04.2 all          Shared files used by the NVIDIA libr>
un  libnvidia-compute                          <none>                     <none>       (no description available)
ii  libnvidia-compute-515:amd64                515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA libcompute package
ii  libnvidia-compute-515:i386                 515.48.07-0ubuntu0.22.04.2 i386         NVIDIA libcompute package
ii  libnvidia-container-tools                  1.10.0-1                   amd64        NVIDIA container runtime library (co>
ii  libnvidia-container1:amd64                 1.10.0-1                   amd64        NVIDIA container runtime library
un  libnvidia-decode                           <none>                     <none>       (no description available)
ii  libnvidia-decode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA Video Decoding runtime librar>
ii  libnvidia-decode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVIDIA Video Decoding runtime librar>
ii  libnvidia-egl-wayland1:amd64               1:1.1.9-1.1                amd64        Wayland EGL External Platform librar>
un  libnvidia-encode                           <none>                     <none>       (no description available)
ii  libnvidia-encode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVENC Video Encoding runtime library
un  libnvidia-encode1                          <none>                     <none>       (no description available)
un  libnvidia-extra                            <none>                     <none>       (no description available)
ii  libnvidia-extra-515:amd64                  515.48.07-0ubuntu0.22.04.2 amd64        Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                             <none>                     <none>       (no description available)
ii  libnvidia-fbc1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL-based Framebuffer Capt>
ii  libnvidia-fbc1-515:i386                    515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL-based Framebuffer Capt>
un  libnvidia-gl                               <none>                     <none>       (no description available)
un  libnvidia-gl-390                           <none>                     <none>       (no description available)
un  libnvidia-gl-410                           <none>                     <none>       (no description available)
ii  libnvidia-gl-515:amd64                     515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND lib>
ii  libnvidia-gl-515:i386                      515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND lib>
un  libnvidia-legacy-390xx-egl-wayland1        <none>                     <none>       (no description available)
un  libnvidia-ml1                              <none>                     <none>       (no description available)
ii  linux-modules-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for vers>
ii  linux-modules-nvidia-515-generic-hwe-22.04 5.15.0-40.43+1             amd64        Extra drivers for nvidia-515 for the>
ii  linux-objects-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for vers>
ii  linux-signatures-nvidia-5.15.0-40-generic  5.15.0-40.43+1             amd64        Linux kernel signatures for nvidia m>
un  nvidia-384                                 <none>                     <none>       (no description available)
un  nvidia-390                                 <none>                     <none>       (no description available)
un  nvidia-common                              <none>                     <none>       (no description available)
un  nvidia-compute-utils                       <none>                     <none>       (no description available)
ii  nvidia-compute-utils-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA compute utilities
un  nvidia-container-runtime                   <none>                     <none>       (no description available)
un  nvidia-container-runtime-hook              <none>                     <none>       (no description available)
ii  nvidia-container-toolkit                   1.10.0-1                   amd64        NVIDIA container runtime hook
un  nvidia-dkms-515                            <none>                     <none>       (no description available)
un  nvidia-docker                              <none>                     <none>       (no description available)
ii  nvidia-docker2                             2.11.0-1                   all          nvidia-docker CLI wrapper
ii  nvidia-driver-515                          515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                       <none>                     <none>       (no description available)
un  nvidia-egl-wayland-common                  <none>                     <none>       (no description available)
un  nvidia-kernel-common                       <none>                     <none>       (no description available)
ii  nvidia-kernel-common-515                   515.48.07-0ubuntu0.22.04.2 amd64        Shared files used with the kernel mo>
un  nvidia-kernel-source                       <none>                     <none>       (no description available)
ii  nvidia-kernel-source-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA kernel source package
un  nvidia-libopencl1-dev                      <none>                     <none>       (no description available)
un  nvidia-opencl-icd                          <none>                     <none>       (no description available)
un  nvidia-persistenced                        <none>                     <none>       (no description available)
un  nvidia-prebuilt-kernel                     <none>                     <none>       (no description available)
ii  nvidia-prime                               0.8.17.1                   all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            510.47.03-0ubuntu1         amd64        Tool for configuring the NVIDIA grap>
un  nvidia-settings-binary                     <none>                     <none>       (no description available)
un  nvidia-smi                                 <none>                     <none>       (no description available)
un  nvidia-utils                               <none>                     <none>       (no description available)
ii  nvidia-utils-515                           515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver support binaries
un  libgldispatch0-nvidia                      <none>                     <none>       (no description available)
ii  libnvidia-cfg1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                         <none>                     <none>       (no description available)
un  libnvidia-common                           <none>                     <none>       (no description available)
ii  libnvidia-common-515                       515.48.07-0ubuntu0.22.04.2 all          Shared files used by the NVIDIA libraries
un  libnvidia-compute                          <none>                     <none>       (no description available)
ii  libnvidia-compute-515:amd64                515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA libcompute package
ii  libnvidia-compute-515:i386                 515.48.07-0ubuntu0.22.04.2 i386         NVIDIA libcompute package
ii  libnvidia-container-tools                  1.10.0-1                   amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                 1.10.0-1                   amd64        NVIDIA container runtime library
un  libnvidia-decode                           <none>                     <none>       (no description available)
ii  libnvidia-decode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-egl-wayland1:amd64               1:1.1.9-1.1                amd64        Wayland EGL External Platform library -- shared library
un  libnvidia-encode                           <none>                     <none>       (no description available)
ii  libnvidia-encode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVENC Video Encoding runtime library
un  libnvidia-encode1                          <none>                     <none>       (no description available)
un  libnvidia-extra                            <none>                     <none>       (no description available)
ii  libnvidia-extra-515:amd64                  515.48.07-0ubuntu0.22.04.2 amd64        Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                             <none>                     <none>       (no description available)
ii  libnvidia-fbc1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-515:i386                    515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                               <none>                     <none>       (no description available)
un  libnvidia-gl-390                           <none>                     <none>       (no description available)
un  libnvidia-gl-410                           <none>                     <none>       (no description available)
ii  libnvidia-gl-515:amd64                     515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-515:i386                      515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-legacy-390xx-egl-wayland1        <none>                     <none>       (no description available)
un  libnvidia-ml1                              <none>                     <none>       (no description available)
ii  linux-modules-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for version 5.15.0-40
ii  linux-modules-nvidia-515-generic-hwe-22.04 5.15.0-40.43+1             amd64        Extra drivers for nvidia-515 for the generic-hwe-22.04 flavour
ii  linux-objects-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for version 5.15.0-40 (objects)
ii  linux-signatures-nvidia-5.15.0-40-generic  5.15.0-40.43+1             amd64        Linux kernel signatures for nvidia modules for version 5.15.0-4>
un  nvidia-384                                 <none>                     <none>       (no description available)
un  nvidia-390                                 <none>                     <none>       (no description available)
un  nvidia-common                              <none>                     <none>       (no description available)
un  nvidia-compute-utils                       <none>                     <none>       (no description available)
ii  nvidia-compute-utils-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA compute utilities
un  nvidia-container-runtime                   <none>                     <none>       (no description available)
un  nvidia-container-runtime-hook              <none>                     <none>       (no description available)
ii  nvidia-container-toolkit                   1.10.0-1                   amd64        NVIDIA container runtime hook
un  nvidia-dkms-515                            <none>                     <none>       (no description available)
un  nvidia-docker                              <none>                     <none>       (no description available)
ii  nvidia-docker2                             2.11.0-1                   all          nvidia-docker CLI wrapper
ii  nvidia-driver-515                          515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                       <none>                     <none>       (no description available)
un  nvidia-egl-wayland-common                  <none>                     <none>       (no description available)
un  nvidia-kernel-common                       <none>                     <none>       (no description available)
ii  nvidia-kernel-common-515                   515.48.07-0ubuntu0.22.04.2 amd64        Shared files used with the kernel module
un  nvidia-kernel-source                       <none>                     <none>       (no description available)
ii  nvidia-kernel-source-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA kernel source package
un  nvidia-libopencl1-dev                      <none>                     <none>       (no description available)
un  nvidia-opencl-icd                          <none>                     <none>       (no description available)
un  nvidia-persistenced                        <none>                     <none>       (no description available)
un  nvidia-prebuilt-kernel                     <none>                     <none>       (no description available)
ii  nvidia-prime                               0.8.17.1                   all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            510.47.03-0ubuntu1         amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                     <none>                     <none>       (no description available)
un  nvidia-smi                                 <none>                     <none>       (no description available)
un  nvidia-utils                               <none>                     <none>       (no description available)
ii  nvidia-utils-515                           515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver support binaries
ii  xserver-xorg-video-nvidia-515              515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA binary Xorg driver
  • [x] NVIDIA container library version from nvidia-container-cli -V
cli-version: 1.10.0
lib-version: 1.10.0
build date: 2022-06-13T10:39+00:00
build revision: 395fd41701117121f1fd04ada01e1d7e006a37ae
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

And comments on these info? Thank you!

Lonitch avatar Jul 11 '22 15:07 Lonitch

Question: How is docker and docker compose installed? We have seen strange behaviour when these are installed using snaps.

Looking at the error message it seems as if the NVIDIA Container CLI cannot load the NVML library libnvidia-ml.so. This could occur if docker compose modifies the library search paths or ldcache in some way.

elezar avatar Jul 12 '22 07:07 elezar

Question: How is docker and docker compose installed? We have seen strange behaviour when these are installed using snaps.

Looking at the error message it seems as if the NVIDIA Container CLI cannot load the NVML library libnvidia-ml.so. This could occur if docker compose modifies the library search paths or ldcache in some way.

Thanks for your reply. I installed docker engine by following the steps from docker docs. After that, I install the docker desktop by following the instructions here.

I didn't use snaps.

Lonitch avatar Jul 12 '22 13:07 Lonitch

Sorry @Lonitch, I thought I had asked, but does running the container without docker compose work as expected:

docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu:22.04 nvidia-smi

elezar avatar Jul 14 '22 08:07 elezar

@elezar Thanks for your reply. It does not work unfortunately. Still gives docker: Error response from daemon: Unknown runtime specified nvidia.

Lonitch avatar Jul 14 '22 22:07 Lonitch

Just reinstalled Ubuntu20.04 LTS on the machine, and the same error still spins out. I wonder if it has something to do with my dual graphic card(2 Nvidia RTX).

Lonitch avatar Jul 16 '22 01:07 Lonitch

@Lonitch have you tried installing drivers by issuing Step 1 sudo ubuntu-drivers autoinstall Step 2 ubuntu-drivers devices Step 3 Install the recommended option based on the previous terminal output for example: sudo apt install nvidia-driver-515 Step 4 sudo reboot

This worked for me.

diegoavillegasg avatar Sep 14 '22 14:09 diegoavillegasg

@Lonitch since docker complains with:

docker: Error response from daemon: Unknown runtime specified nvidia.

What are the contents of your /etc/docker/daemon.json file or how would you instruct docker-compose to use the NVIDIA runtime?

Note that this configuration is irrespective of the driver or the GPUs that you have installed.

elezar avatar Sep 14 '22 14:09 elezar