k8s-device-plugin
k8s-device-plugin copied to clipboard
nvidia-device-plugin daemonset has 0 desired and no pod is launched
Thanks for the brilliant tool to deploy GPU-enabled pods by k8s. I have successfully installed all the prerequisites (including docker, nvidia-docker2, kubernetes). Some system and software information is as follows:
GPU device: Nvidia GeForce 2070 SUPER Driver version: 515.48.07 Docker version: 20.10.17 Kubernetes version: 1.24.2
The /etc/docker/daemon.json has been edited as follows:

I have also checked that nvidia docker runs successfully with "docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi".
After I executed the following instruction to deploy "nvidia-device-plugin-daemonset":
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml
Then I checked the daemonset status with "kubectl get daemonset -A" and had:
The pod information is:
It seems that no pod of "nvidia-device-plugin" is launched.
Would you mind giving some suggestions to solve this? Thank you!
Hi, were you able to solve this ? @blackjack2015 I am stuck in the same spot.
Double-check that pods can be scheduled to your node. I forgot to remove the node-role.kubernetes.io/control-plane
taint and was having this problem.
Double-check that pods can be scheduled to your node. I forgot to remove the
node-role.kubernetes.io/control-plane
taint and was having this problem.
I have certainly confirmed this. My current solution is just using kubernetes 1.22 instead of 1.24. The point is that the versions above 1.22 apply contained as the default container manager, while 1.22 applies dockerd.
Hey! Any update on this issue? @blackjack2015 @anibali
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.