kind-with-gpus-examples
kind-with-gpus-examples copied to clipboard
Unable to start pods for the device plugin
Hi Kevin,
I am new to K8S, so please bear with me if my question seems basic. I have successfully created a cluster and installed the device plugin, but I'm having an issue where the daemonset is not being scheduled. Below are the details, could you help me understand why the daemonset is not being scheduled? Thank you!
$ kubectl --context=kind-${KIND_CLUSTER_NAME} get pod -n nvidia
No resources found in nvidia namespace.
$ kubectl describe daemonset nvidia-device-plugin -n nvidia --context=kind-${KIND_CLUSTER_NAME}
Name: nvidia-device-plugin
Selector: app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin
Node-Selector: <none>
Labels: app.kubernetes.io/instance=nvidia-device-plugin
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=nvidia-device-plugin
app.kubernetes.io/version=0.15.0
helm.sh/chart=nvidia-device-plugin-0.15.0
Annotations: deprecated.daemonset.template.generation: 1
meta.helm.sh/release-name: nvidia-device-plugin
meta.helm.sh/release-namespace: nvidia
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app.kubernetes.io/instance=nvidia-device-plugin
app.kubernetes.io/name=nvidia-device-plugin
Containers:
nvidia-device-plugin-ctr:
Image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
Port: <none>
Host Port: <none>
Command:
nvidia-device-plugin
Environment:
MPS_ROOT: /run/nvidia/mps
NVIDIA_MIG_MONITOR_DEVICES: all
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility
Mounts:
/dev/shm from mps-shm (rw)
/mps from mps-root (rw)
/var/lib/kubelet/device-plugins from device-plugin (rw)
/var/run/cdi from cdi-root (rw)
Volumes:
device-plugin:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
mps-root:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/mps
HostPathType: DirectoryOrCreate
mps-shm:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/mps/shm
HostPathType:
cdi-root:
Type: HostPath (bare host directory volume)
Path: /var/run/cdi
HostPathType: DirectoryOrCreate
Priority Class Name: system-node-critical
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
nvidia.com/gpu:NoSchedule op=Exists
Events: <none>