k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Error: failed to start container "nvidia-device-plugin-ctr": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: signal: segmentation fault (core dumped), stdout: , stderr: \\\"\"": unknown

Open wxitzxg opened this issue 5 years ago • 15 comments

[root@k8s-node1 docker]# nvidia-docker version NVIDIA Docker: 2.3.0 Client: Docker Engine - Community Version: 19.03.9 API version: 1.39 Go version: go1.13.10 Git commit: 9d988398e7 Built: Fri May 15 00:25:27 2020 OS/Arch: linux/amd64 Experimental: false

Server: Docker Engine - Community Engine: Version: 18.09.6 API version: 1.39 (minimum version 1.12) Go version: go1.10.8 Git commit: 481bc77 Built: Sat May 4 02:02:43 2019 OS/Arch: linux/amd64 Experimental: false

wxitzxg avatar May 19 '20 09:05 wxitzxg

Can you tell me what the results of running the following commands are:

$ nvidia-container-runtime-hook
$ nvidia-container-toolkit

klueska avatar May 19 '20 10:05 klueska

I'm also curious what OS you are on (i.e. centos8, ubunt18.04, etc.). Trying to determine if it might be related to https://github.com/NVIDIA/nvidia-docker/issues/1280#issuecomment-630754999 or something different.

klueska avatar May 19 '20 11:05 klueska

New packages have been published that should resolve this issue. Please run one of the following depending on your platform:

sudo apt-get install nvidia-container-toolkit
sudo yum install nvidia-container-toolkit

If you originally installed nvidia-docker2 and not nvidia-container-toolkit, you should still run the commands above in order to update nvidia-docker2 properly (it has a dependence on nvidia-container-toolkit that will now be upgraded).

klueska avatar May 19 '20 21:05 klueska

Please confirm if the new packages resolve your issue and close this issue if so.

klueska avatar May 20 '20 10:05 klueska

Hello, after I installed the nvidia-container-toolkit package, there are still such problems:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout:, stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n \\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
$ nvidia-container-runtime-hook
Usage of nvidia-container-runtime-hook:
  -config string
        configuration file
  -debug
        enable debug output

Commands:
  prestart
        run the prestart hook
  poststart
        no-op
  poststop
        no-op

$ nvidia-container-toolkit
Usage of nvidia-container-toolkit:
  -config string
        configuration file
  -debug
        enable debug output

Commands:
  prestart
        run the prestart hook
  poststart
        no-op
  poststop
        no-op

My system is: Ubuntu 20.04 (Windows10 WSL2)

What can I do to solve this problem? @klueska

Hsuey avatar Nov 28 '20 15:11 Hsuey

@Hsuey What nvidia-driver version do you have installed?

klueska avatar Dec 01 '20 12:12 klueska

@klueska I installed version is CUDA Toolkit 11.1 .

Hsuey avatar Dec 01 '20 14:12 Hsuey

@klueska How to resolve this problem?

Hsuey avatar Dec 04 '20 10:12 Hsuey

I just saw your system is Ubuntu 20.04 (Windows10 WSL2). I'm not that familiar with debugging issue on WSL 2. Hopefully @dualvtable can help.

klueska avatar Dec 04 '20 10:12 klueska

@dualvtable Can you help me to resolve this problem?

Hsuey avatar Dec 16 '20 11:12 Hsuey

As far as I know, running the device plugin on WSL2 is not yet supported. @dualvtable can comment more, but I'm pretty sure it won't work because the device plugin requires NVML, which is not available in a WSL2 environment.

klueska avatar Dec 16 '20 11:12 klueska

@klueska But in this document https://docs.nvidia.com/cuda/wsl-user-guide/index.html, CUDA can run in WSL2, but I failed to install it. I don’t know what went wrong. My problem : When I run this command docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown. ERRO[0000] error waiting for container: context canceled

Hsuey avatar Dec 16 '20 11:12 Hsuey

Is this document fake? @klueska

Hsuey avatar Dec 17 '20 03:12 Hsuey

hi @Hsuey - what is the Windows NVIDIA driver version that you're in the system? Did you download at least 465.12 from https://developer.nvidia.com/cuda/wsl and then follow the steps as described in the user guide?

dualvtable avatar Dec 17 '20 06:12 dualvtable

@dualvtable Yes, I installed GEFORCE DRIVER(465.12_gameready_win10-dch_64bit_international.exe)

Hsuey avatar Dec 17 '20 08:12 Hsuey

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Feb 29 '24 04:02 github-actions[bot]

This issue was automatically closed due to inactivity.

github-actions[bot] avatar Mar 31 '24 04:03 github-actions[bot]