nvidia-docker
nvidia-docker copied to clipboard
nvidia-container-cli reports incorrect CUDA driver version on WSL2
1. Issue or feature description
nvidia-container-cli
on WSL2 is reporting CUDA 11.0 (and thus refusing to run containers with cuda>=11.1) even though CUDA toolkit 11.1 is installed in Linux. Windows 10 is build 20251.fe_release.201030-1438. Everything is installed as per the install guide, and CUDA containers do actually work (for example docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
successfully returns a benchmark).
Machine is a Dell XPS 15 9500 with an i9-10885H CPU, 64 GB RAM and an NVIDIA GeForce GTX 1650 Ti.
2. Steps to reproduce the issue
- Install Windows 10 on the insider program with a version at or later than 20251.fe_release.201030-1438
- Install the Windows CUDA drivers from here (this is 460.20 for me)
- Install Ubuntu 20.04, the CUDA toolkit 11.1 and the container runtime as per the nvidia docs
- Run nvidia-smi on the host - it should give a CUDA version of 11.2.
- Check
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
correctly outputs benchmarks - In Linux, run
nvidia-container-cli info
. It incorrectly outputs CUDA version 11.0.
This command will also fail:
$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
3. Information to attach (optional if deemed irrelevant)
-
[x] Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
ncc.txt -
[x] Kernel version from
uname -a
Linux aphid 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
-
[ ] Any relevant kernel output lines from
dmesg
-
[x] Driver information from
nvidia-smi -a
nvidia-smi.txt -
[x] Docker version from
docker version
19.03.13
-
[x] NVIDIA packages version from
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
packages.txt -
[x] NVIDIA container library version from
nvidia-container-cli -V
ncc-version.txt -
[ ] NVIDIA container library logs (see troubleshooting)
-
[x] Docker command, image and tag used
$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash 2>&1 docker-run.txt
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
The same with me
Status: Downloaded newer image for nvidia/cuda:10.2-base docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.
@opptimus seems to have a different issue, but the original issue may be related to: https://github.com/NVIDIA/libnvidia-container/issues/117#issuecomment-725373082
@klueska To be fair, @opptimus' issue is the one I actually bumped into to start with. It was only after further digging I realised nvidia-container-cli
was also reporting the wrong version. I may be getting the cart before the horse, I'm pretty new to this :)
@danfairs I solve my problems with upgrading my Win10 to version 20257.1. Follow official WSL2 guidelines.
Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.
In the meantime you could use the NVIDIA_DISABLE_REQUIRE
environment to skip the CUDA version check.
docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
For reference: here is the merge request extending WSL support.
Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.
In the meantime you could use the
NVIDIA_DISABLE_REQUIRE
environment to skip the CUDA version check.docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
For reference: here is the merge request extending WSL support.
Hi. I have some problem with nvidia-container-cli
. I run this
archee8@DESKTOP-HR2MA0D:~$ docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
@archee8 which version of the NVIDIA container toolkit is this?
The version 1.4.0 of libnvidia-container
should address this issue.
@ archee8 какая это версия инструментария контейнера NVIDIA?
Версия 1.4.0
libnvidia-container
должна решить эту проблему.
archee8@DESKTOP-HR2MA0D:~$ sudo apt-cache policy libnvidia-container-tools
libnvidia-container-tools:
Installed: 1.4.0-1
@archee8 Your issue appears to be related to this: https://github.com/NVIDIA/nvidia-docker/issues/1496#issuecomment-830285200
The following command works, but it doesn't work with docker-compose. Does anyone know the cause?
docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
I have the following environment. The reason for Ubuntu 16.04 is that it cannot be upgraded due to company security issues.
⋊> ~ lsb_release -a 13:29:20
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.7 LTS
Release: 16.04
Codename: xenial
⋊> ~ docker --version 13:29:20
Docker version 20.10.7, build f0df350
⋊> ~ docker-compose --version 13:29:38
docker-compose version 1.29.2, build unknown
⋊> ~ nvidia-container-cli info 13:30:27
NVRM version: 440.118.02
CUDA version: 10.2
Device Index: 0
Device Minor: 0
Model: TITAN X (Pascal)
Brand: GeForce
GPU UUID: GPU-fcae2b3c-b6c0-c0c6-1eef-4f25809d16f9
Bus Location: 00000000:01:00.0
Architecture: 6.1
⋊> ~
This issue is still present when following the current instructions on the official nvidia documentation for this: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#ch05-running-containers
While trying to run https://github.com/borisdayma/dalle-mini in WSL2 I encountered the same error message as @danfairs
root@DESKTOP-DEADBEEF:/mnt/g/github/dalle-mini# docker run --rm --name dallemini --gpus all -it -p 8888:88
88 -v "${PWD}":/workspace dalle-mini:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a newer version, or use an earlier cuda container: unknown.
When I check my currently installed version with nvidia-smi I see that I have version 11.7 installed (the error meesage above requires 11.6):
root@DESKTOP-DEADBEEF:/mnt/g/github/dalle-mini# nvidia-smi
Mon Jun 13 23:34:16 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05 Driver Version: 516.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:26:00.0 On | N/A |
| 0% 38C P8 8W / 175W | 1082MiB / 8192MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I'm kinda stuck right now. Any advice?
@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true
:
docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest
@psychofisch as a workaround please start the container with
NVIDIA_DISABLE_REQUIRE=true
:docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest
I ran into this issue and this work around worked. Thank you @elezar