k8s-device-plugin nvidia-device-plugin daemonset has 0 desired and no pod is launched

nvidia-device-plugin daemonset has 0 desired and no pod is launched

Open blackjack2015 opened this issue 2 years ago • 7 comments

Thanks for the brilliant tool to deploy GPU-enabled pods by k8s. I have successfully installed all the prerequisites (including docker, nvidia-docker2, kubernetes). Some system and software information is as follows:

GPU device: Nvidia GeForce 2070 SUPER Driver version: 515.48.07 Docker version: 20.10.17 Kubernetes version: 1.24.2

The /etc/docker/daemon.json has been edited as follows:

I have also checked that nvidia docker runs successfully with "docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi".

After I executed the following instruction to deploy "nvidia-device-plugin-daemonset": kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml

Then I checked the daemonset status with "kubectl get daemonset -A" and had:

The pod information is:

It seems that no pod of "nvidia-device-plugin" is launched.

Would you mind giving some suggestions to solve this? Thank you!

Jun 25 '22 08:06 blackjack2015

Hi, were you able to solve this ? @blackjack2015 I am stuck in the same spot.

Oct 17 '22 19:10 aditya2803

Double-check that pods can be scheduled to your node. I forgot to remove the node-role.kubernetes.io/control-plane taint and was having this problem.

Nov 17 '22 01:11 anibali

Double-check that pods can be scheduled to your node. I forgot to remove the node-role.kubernetes.io/control-plane taint and was having this problem.

I have certainly confirmed this. My current solution is just using kubernetes 1.22 instead of 1.24. The point is that the versions above 1.22 apply contained as the default container manager, while 1.22 applies dockerd.

Apr 06 '23 07:04 blackjack2015

Hey! Any update on this issue? @blackjack2015 @anibali

May 05 '23 07:05 varskann

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

Feb 28 '24 04:02 github-actions[bot]

k8s-device-plugin k8s-device-plugin copied to clipboard

nvidia-device-plugin daemonset has 0 desired and no pod is launched

k8s-device-plugin
k8s-device-plugin copied to clipboard