k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

nvidia-device-plugin daemonset has 0 desired and no pod is launched

Open blackjack2015 opened this issue 2 years ago • 7 comments

Thanks for the brilliant tool to deploy GPU-enabled pods by k8s. I have successfully installed all the prerequisites (including docker, nvidia-docker2, kubernetes). Some system and software information is as follows:

GPU device: Nvidia GeForce 2070 SUPER Driver version: 515.48.07 Docker version: 20.10.17 Kubernetes version: 1.24.2

The /etc/docker/daemon.json has been edited as follows:

image

I have also checked that nvidia docker runs successfully with "docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi".

After I executed the following instruction to deploy "nvidia-device-plugin-daemonset": kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml

Then I checked the daemonset status with "kubectl get daemonset -A" and had: image

The pod information is: image

It seems that no pod of "nvidia-device-plugin" is launched.

Would you mind giving some suggestions to solve this? Thank you!

blackjack2015 avatar Jun 25 '22 08:06 blackjack2015

Hi, were you able to solve this ? @blackjack2015 I am stuck in the same spot.

aditya2803 avatar Oct 17 '22 19:10 aditya2803

Double-check that pods can be scheduled to your node. I forgot to remove the node-role.kubernetes.io/control-plane taint and was having this problem.

anibali avatar Nov 17 '22 01:11 anibali

Double-check that pods can be scheduled to your node. I forgot to remove the node-role.kubernetes.io/control-plane taint and was having this problem.

I have certainly confirmed this. My current solution is just using kubernetes 1.22 instead of 1.24. The point is that the versions above 1.22 apply contained as the default container manager, while 1.22 applies dockerd.

blackjack2015 avatar Apr 06 '23 07:04 blackjack2015

Hey! Any update on this issue? @blackjack2015 @anibali

varskann avatar May 05 '23 07:05 varskann

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Feb 28 '24 04:02 github-actions[bot]