nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

nvidia-container-cli reports incorrect CUDA driver version on WSL2

Open danfairs opened this issue 4 years ago • 14 comments

1. Issue or feature description

nvidia-container-cli on WSL2 is reporting CUDA 11.0 (and thus refusing to run containers with cuda>=11.1) even though CUDA toolkit 11.1 is installed in Linux. Windows 10 is build 20251.fe_release.201030-1438. Everything is installed as per the install guide, and CUDA containers do actually work (for example docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark successfully returns a benchmark).

Machine is a Dell XPS 15 9500 with an i9-10885H CPU, 64 GB RAM and an NVIDIA GeForce GTX 1650 Ti.

2. Steps to reproduce the issue

  1. Install Windows 10 on the insider program with a version at or later than 20251.fe_release.201030-1438
  2. Install the Windows CUDA drivers from here (this is 460.20 for me)
  3. Install Ubuntu 20.04, the CUDA toolkit 11.1 and the container runtime as per the nvidia docs
  4. Run nvidia-smi on the host - it should give a CUDA version of 11.2.
  5. Check docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark correctly outputs benchmarks
  6. In Linux, run nvidia-container-cli info. It incorrectly outputs CUDA version 11.0.

This command will also fail:

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

3. Information to attach (optional if deemed irrelevant)

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info ncc.txt

  • [x] Kernel version from uname -a Linux aphid 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • [ ] Any relevant kernel output lines from dmesg

  • [x] Driver information from nvidia-smi -a nvidia-smi.txt

  • [x] Docker version from docker version 19.03.13

  • [x] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*' packages.txt

  • [x] NVIDIA container library version from nvidia-container-cli -V ncc-version.txt

  • [ ] NVIDIA container library logs (see troubleshooting)

  • [x] Docker command, image and tag used

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash 2>&1 docker-run.txt
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

danfairs avatar Nov 08 '20 11:11 danfairs

The same with me

Status: Downloaded newer image for nvidia/cuda:10.2-base docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.

opptimus avatar Nov 12 '20 02:11 opptimus

@opptimus seems to have a different issue, but the original issue may be related to: https://github.com/NVIDIA/libnvidia-container/issues/117#issuecomment-725373082

klueska avatar Nov 12 '20 09:11 klueska

@klueska To be fair, @opptimus' issue is the one I actually bumped into to start with. It was only after further digging I realised nvidia-container-cli was also reporting the wrong version. I may be getting the cart before the horse, I'm pretty new to this :)

danfairs avatar Nov 12 '20 13:11 danfairs

@danfairs I solve my problems with upgrading my Win10 to version 20257.1. Follow official WSL2 guidelines.

opptimus avatar Nov 20 '20 11:11 opptimus

Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.

In the meantime you could use the NVIDIA_DISABLE_REQUIRE environment to skip the CUDA version check.

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

For reference: here is the merge request extending WSL support.

elezar avatar Feb 12 '21 09:02 elezar

Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.

In the meantime you could use the NVIDIA_DISABLE_REQUIRE environment to skip the CUDA version check.

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

For reference: here is the merge request extending WSL support.

Hi. I have some problem with nvidia-container-cli. I run this

archee8@DESKTOP-HR2MA0D:~$ docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

archee8 avatar May 04 '21 20:05 archee8

@archee8 which version of the NVIDIA container toolkit is this?

The version 1.4.0 of libnvidia-container should address this issue.

elezar avatar May 05 '21 05:05 elezar

@ archee8 какая это версия инструментария контейнера NVIDIA?

Версия 1.4.0 libnvidia-containerдолжна решить эту проблему.

archee8@DESKTOP-HR2MA0D:~$ sudo apt-cache policy libnvidia-container-tools
libnvidia-container-tools:
  Installed: 1.4.0-1

archee8 avatar May 05 '21 08:05 archee8

@archee8 Your issue appears to be related to this: https://github.com/NVIDIA/nvidia-docker/issues/1496#issuecomment-830285200

klueska avatar May 05 '21 09:05 klueska

The following command works, but it doesn't work with docker-compose. Does anyone know the cause?

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

I have the following environment. The reason for Ubuntu 16.04 is that it cannot be upgraded due to company security issues.

⋊> ~ lsb_release -a                                                                                                                                                13:29:20
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.7 LTS
Release:        16.04
Codename:       xenial
⋊> ~ docker --version                                                                                                                                              13:29:20
Docker version 20.10.7, build f0df350
⋊> ~ docker-compose --version                                                                                                                                      13:29:38
docker-compose version 1.29.2, build unknown
⋊> ~ nvidia-container-cli info                                                                                                                                     13:30:27
NVRM version:   440.118.02
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          TITAN X (Pascal)
Brand:          GeForce
GPU UUID:       GPU-fcae2b3c-b6c0-c0c6-1eef-4f25809d16f9
Bus Location:   00000000:01:00.0
Architecture:   6.1
⋊> ~ 

Keiku avatar Mar 24 '22 04:03 Keiku

This issue is still present when following the current instructions on the official nvidia documentation for this: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#ch05-running-containers

andresgalaviz avatar Apr 01 '22 00:04 andresgalaviz

While trying to run https://github.com/borisdayma/dalle-mini in WSL2 I encountered the same error message as @danfairs

root@DESKTOP-DEADBEEF:/mnt/g/github/dalle-mini# docker run --rm --name dallemini --gpus all -it -p 8888:88
88 -v "${PWD}":/workspace dalle-mini:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a newer version, or use an earlier cuda container: unknown.

When I check my currently installed version with nvidia-smi I see that I have version 11.7 installed (the error meesage above requires 11.6):

root@DESKTOP-DEADBEEF:/mnt/g/github/dalle-mini# nvidia-smi
Mon Jun 13 23:34:16 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 516.01       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:26:00.0  On |                  N/A |
|  0%   38C    P8     8W / 175W |   1082MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I'm kinda stuck right now. Any advice?

psychofisch avatar Jun 13 '22 21:06 psychofisch

@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true:

docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest

elezar avatar Jun 14 '22 09:06 elezar

@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true:

docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest

I ran into this issue and this work around worked. Thank you @elezar

TheFrator avatar Sep 21 '22 21:09 TheFrator