nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

Got `docker: Error response from daemon: OCI runtime create failed:` only while NVLink attached

Open cttsai1985 opened this issue 5 years ago • 10 comments

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

Get error message while run docker run --gpus all nvidia/cuda:10.1-runtime nvidia-smi. However, this command can went through successfully if physically removed the NVLink bridge.

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: timed out\\\\n\\\"\"": unknown.
ERRO[0032] error waiting for container: context canceled 

2. Steps to reproduce the issue

run docker run --gpus all nvidia/cuda:10.1-runtime nvidia-smi while NVLink bridge is attached. It will take a while the whole system is not responsive.

3. Information to attach (optional if deemed irrelevant)

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --

I0912 02:27:54.664955 8010 nvc.c:281] initializing library context (version=1.0.5, build=13b836390888f7b7c7dca115d16d7e28ab15a836)
I0912 02:27:54.665002 8010 nvc.c:255] using root /
I0912 02:27:54.665007 8010 nvc.c:256] using ldcache /etc/ld.so.cache
I0912 02:27:54.665011 8010 nvc.c:257] using unprivileged user 65534:65534
I0912 02:27:54.666601 8011 nvc.c:191] loading kernel module nvidia
I0912 02:27:54.666880 8011 nvc.c:203] loading kernel module nvidia_uvm
I0912 02:27:54.667030 8011 nvc.c:211] loading kernel module nvidia_modeset
I0912 02:27:54.667366 8012 driver.c:133] starting driver service
W0912 02:28:19.702479 8010 driver.c:220] terminating driver service (forced)
I0912 02:28:26.875583 8010 driver.c:233] driver service terminated with signal 15
nvidia-container-cli: initialization error: driver error: timed out

  • [x] Kernel version from uname -a
Linux ThreadRipperRTX 5.0.0-27-generic #28~18.04.1-Ubuntu SMP Thu Aug 22 03:00:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • [ ] Any relevant kernel output lines from dmesg
  • [x] Driver information from nvidia-smi -a

==============NVSMI LOG==============

Timestamp                           : Wed Sep 11 22:29:19 2019
Driver Version                      : 418.87.00
CUDA Version                        : 10.1

Attached GPUs                       : 3
GPU 00000000:08:00.0
    Product Name                    : GeForce RTX 2080 Ti
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-13b5329a-0931-95b3-50cf-6532e95475ed
    Minor Number                    : 0
    VBIOS Version                   : 90.02.17.00.5F
    MultiGPU Board                  : No
    Board ID                        : 0x800
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.02.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x08
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1E0410DE
        Bus Id                      : 00000000:08:00.0
        Sub System Id               : 0x250319DA
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 8x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 29 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 10989 MiB
        Used                        : 1 MiB
        Free                        : 10988 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 3 MiB
        Free                        : 253 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
        Aggregate
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 33 C
        GPU Shutdown Temp           : 94 C
        GPU Slowdown Temp           : 91 C
        GPU Max Operating Temp      : 89 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 22.14 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 280.00 W
    Clocks
        Graphics                    : 300 MHz
        SM                          : 300 MHz
        Memory                      : 405 MHz
        Video                       : 540 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 2100 MHz
        SM                          : 2100 MHz
        Memory                      : 7000 MHz
        Video                       : 1950 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

GPU 00000000:42:00.0
    Product Name                    : GeForce RTX 2080 Ti
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-fb3f29fb-86f4-c8d3-1258-77518bc07ff8
    Minor Number                    : 1
    VBIOS Version                   : 90.02.17.00.5F
    MultiGPU Board                  : No
    Board ID                        : 0x4200
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.02.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x42
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1E0410DE
        Bus Id                      : 00000000:42:00.0
        Sub System Id               : 0x250319DA
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 8x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 30 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 10989 MiB
        Used                        : 1 MiB
        Free                        : 10988 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 3 MiB
        Free                        : 253 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
        Aggregate
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 35 C
        GPU Shutdown Temp           : 94 C
        GPU Slowdown Temp           : 91 C
        GPU Max Operating Temp      : 89 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 8.52 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 280.00 W
    Clocks
        Graphics                    : 300 MHz
        SM                          : 300 MHz
        Memory                      : 405 MHz
        Video                       : 540 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 2100 MHz
        SM                          : 2100 MHz
        Memory                      : 7000 MHz
        Video                       : 1950 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

GPU 00000000:43:00.0
    Product Name                    : GeForce RTX 2070
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-2499258b-2519-1e61-06d0-f9aae805c21c
    Minor Number                    : 2
    VBIOS Version                   : 90.06.16.00.17
    MultiGPU Board                  : No
    Board ID                        : 0x4300
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.02.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x43
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1F0710DE
        Bus Id                      : 00000000:43:00.0
        Sub System Id               : 0x21723842
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 0 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 7951 MiB
        Used                        : 482 MiB
        Free                        : 7469 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 4 MiB
        Free                        : 252 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 1 %
        Memory                      : 10 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
        Aggregate
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 48 C
        GPU Shutdown Temp           : 94 C
        GPU Slowdown Temp           : 91 C
        GPU Max Operating Temp      : 89 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 18.58 W
        Power Limit                 : 185.00 W
        Default Power Limit         : 185.00 W
        Enforced Power Limit        : 185.00 W
        Min Power Limit             : 105.00 W
        Max Power Limit             : 240.00 W
    Clocks
        Graphics                    : 300 MHz
        SM                          : 300 MHz
        Memory                      : 405 MHz
        Video                       : 540 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 2160 MHz
        SM                          : 2160 MHz
        Memory                      : 7001 MHz
        Video                       : 1950 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 1518
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 341 MiB
        Process ID                  : 2251
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 138 MiB

  • [x] Docker version from docker version
Client: Docker Engine - Community
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        6a30dfc
 Built:             Thu Aug 29 05:29:11 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.8
  Git commit:       6a30dfc
  Built:            Thu Aug 29 05:27:45 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • [x] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                             Version                    Architecture Description
+++-================================-==========================-============-=========================================================
un  libgldispatch0-nvidia            <none>                     <none>       (no description available)
ii  libnvidia-cfg1-418:amd64         418.87.00-0ubuntu1         amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any               <none>                     <none>       (no description available)
un  libnvidia-common                 <none>                     <none>       (no description available)
ii  libnvidia-common-418             418.87.00-0ubuntu1         all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-410:amd64      410.104-0ubuntu0~18.04.1   amd64        NVIDIA libcompute package
ii  libnvidia-compute-418:amd64      418.87.00-0ubuntu1         amd64        NVIDIA libcompute package
rc  libnvidia-compute-430:amd64      430.40-0ubuntu0~gpu18.04.1 amd64        NVIDIA libcompute package
rc  libnvidia-compute-435:amd64      435.21-0ubuntu0~18.04.2    amd64        NVIDIA libcompute package
ii  libnvidia-container-tools        1.0.5-1                    amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64       1.0.5-1                    amd64        NVIDIA container runtime library
un  libnvidia-decode                 <none>                     <none>       (no description available)
ii  libnvidia-decode-418:amd64       418.87.00-0ubuntu1         amd64        NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                 <none>                     <none>       (no description available)
ii  libnvidia-encode-418:amd64       418.87.00-0ubuntu1         amd64        NVENC Video Encoding runtime library
un  libnvidia-fbc1                   <none>                     <none>       (no description available)
ii  libnvidia-fbc1-418:amd64         418.87.00-0ubuntu1         amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                     <none>                     <none>       (no description available)
ii  libnvidia-gl-418:amd64           418.87.00-0ubuntu1         amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                   <none>                     <none>       (no description available)
ii  libnvidia-ifr1-418:amd64         418.87.00-0ubuntu1         amd64        NVIDIA OpenGL-based Inband Frame Readback runtime library
un  libnvidia-ml1                    <none>                     <none>       (no description available)
un  nvidia-304                       <none>                     <none>       (no description available)
un  nvidia-340                       <none>                     <none>       (no description available)
un  nvidia-384                       <none>                     <none>       (no description available)
un  nvidia-390                       <none>                     <none>       (no description available)
un  nvidia-common                    <none>                     <none>       (no description available)
ii  nvidia-compute-utils-418         418.87.00-0ubuntu1         amd64        NVIDIA compute utilities
un  nvidia-container-runtime         <none>                     <none>       (no description available)
un  nvidia-container-runtime-hook    <none>                     <none>       (no description available)
ii  nvidia-container-toolkit         1.0.5-1                    amd64        NVIDIA container runtime hook
ii  nvidia-dkms-418                  418.87.00-0ubuntu1         amd64        NVIDIA DKMS package
un  nvidia-dkms-kernel               <none>                     <none>       (no description available)
ii  nvidia-driver-418                418.87.00-0ubuntu1         amd64        NVIDIA driver metapackage
un  nvidia-driver-binary             <none>                     <none>       (no description available)
un  nvidia-kernel-common             <none>                     <none>       (no description available)
ii  nvidia-kernel-common-418         418.87.00-0ubuntu1         amd64        Shared files used with the kernel module
un  nvidia-kernel-source             <none>                     <none>       (no description available)
ii  nvidia-kernel-source-418         418.87.00-0ubuntu1         amd64        NVIDIA kernel source package
un  nvidia-legacy-340xx-vdpau-driver <none>                     <none>       (no description available)
un  nvidia-libopencl1-dev            <none>                     <none>       (no description available)
ii  nvidia-modprobe                  418.87.00-0ubuntu1         amd64        Load the NVIDIA kernel driver and create device files
un  nvidia-opencl-icd                <none>                     <none>       (no description available)
un  nvidia-persistenced              <none>                     <none>       (no description available)
ii  nvidia-prime                     0.8.8.2                    all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                  418.87.00-0ubuntu1         amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary           <none>                     <none>       (no description available)
un  nvidia-smi                       <none>                     <none>       (no description available)
un  nvidia-utils                     <none>                     <none>       (no description available)
ii  nvidia-utils-418                 418.87.00-0ubuntu1         amd64        NVIDIA driver support binaries
un  nvidia-vdpau-driver              <none>                     <none>       (no description available)
ii  xserver-xorg-video-nvidia-418    418.87.00-0ubuntu1         amd64        NVIDIA binary Xorg driver
  • [x] NVIDIA container library version from nvidia-container-cli -V
version: 1.0.5
build date: 2019-09-06T16:59+00:00
build revision: 13b836390888f7b7c7dca115d16d7e28ab15a836
build compiler: x86_64-linux-gnu-gcc-7 7.4.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Nothing was logged.

  • [x] Docker command, image and tag used

docker run --gpus all nvidia/cuda:10.1-runtime nvidia-smi

cttsai1985 avatar Sep 12 '19 03:09 cttsai1985

Just wondering if you've solved this eventually... am running into the same problem.

Thanks!

yucolabjames avatar Sep 24 '19 08:09 yucolabjames

@yucolabjames Not at all. I detached the NVLink bridge in the end so I can keep things rolling. Moreover, I tried several trouble shooting on this forum, none of them work as well.

cttsai1985 avatar Sep 24 '19 13:09 cttsai1985

Reproduced the same behavior after 10/2/2019 updating 'apt update' on Ubuntu 18.04. Removed NVLink as a workaround successfull.

mattcurf avatar Oct 02 '19 21:10 mattcurf

Reproduced the same behavior under Debian 10. Any further progress on this?

bluebox42 avatar Jan 27 '20 08:01 bluebox42

I have not tried to figure this issue out later on but things that work for me for now is to switch to Intel platform.

cttsai1985 avatar Jan 27 '20 16:01 cttsai1985

Can anybody else with that problem supply the output of nvidia-bug-report.sh (see https://github.com/NVIDIA/nvidia-docker/issues/1180)?

bluebox42 avatar Jan 30 '20 08:01 bluebox42

Hello!

Sorry for the lack of support on this, having the output of nvidia-bug-report.sh would be super helpful to debug this (likely) driver issue. Thanks!

RenaudWasTaken avatar Feb 10 '20 22:02 RenaudWasTaken

The same question. Is there any solution to solve it?

jiangxiaobin96 avatar Jul 20 '21 08:07 jiangxiaobin96

@jiangxiaobin96 the NVLink devices are not currently being mounted from the host into the container by default. You could try to add --device /dev/nvidia-nvlink arguments to your docker command line (as well as any /dev/nvidia-nvswitch* devices that may exist.

Also as mentioned in the comments, the output of nvidia-bug-report.sh would be helpful.

elezar avatar Jul 20 '21 08:07 elezar

In general it shouldn’t be necessary to inject these into a container. So long as you have fabric manager running on the host and the fabric manager socket injected in the container (which libnvidia-container should do for you) then things should work as expected regarding nvswitches/nvlinks.

klueska avatar Jul 20 '21 08:07 klueska