k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

0/1 nodes are available: 1 Insufficient nvidia.com/gpu.

Open liufangpeng opened this issue 2 years ago • 3 comments

Fri Feb 17 16:56:54 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A40-4Q On | 00000000:02:00.0 Off | 0 | | N/A N/A P0 N/A / N/A | 0MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

liufangpeng avatar Feb 17 '23 08:02 liufangpeng

我使用的k8s-device-plugin:1.9 并且 kubectl describe node 提示了nvidia.com/gpu 0 0 依旧会报错

liufangpeng avatar Feb 17 '23 08:02 liufangpeng

插件上报错 2023/02/17 09:19:13 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock 2023/02/17 09:19:13 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service deviceplugin.Registration 2023/02/17 09:19:13 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate? 2023/02/17 09:19:13 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites 2023/02/17 09:19:13 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start

liufangpeng avatar Feb 17 '23 09:02 liufangpeng

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Feb 28 '24 04:02 github-actions[bot]