kind-with-gpus-examples icon indicating copy to clipboard operation
kind-with-gpus-examples copied to clipboard

Unable to start pods for the device plugin

Open zhewenhu opened this issue 8 months ago • 2 comments

Hi Kevin,

I am new to K8S, so please bear with me if my question seems basic. I have successfully created a cluster and installed the device plugin, but I'm having an issue where the daemonset is not being scheduled. Below are the details, could you help me understand why the daemonset is not being scheduled? Thank you!

$ kubectl --context=kind-${KIND_CLUSTER_NAME} get pod -n nvidia
No resources found in nvidia namespace.
$ kubectl describe daemonset nvidia-device-plugin -n nvidia --context=kind-${KIND_CLUSTER_NAME}
Name:           nvidia-device-plugin
Selector:       app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin
Node-Selector:  <none>
Labels:         app.kubernetes.io/instance=nvidia-device-plugin
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=nvidia-device-plugin
                app.kubernetes.io/version=0.15.0
                helm.sh/chart=nvidia-device-plugin-0.15.0
Annotations:    deprecated.daemonset.template.generation: 1
                meta.helm.sh/release-name: nvidia-device-plugin
                meta.helm.sh/release-namespace: nvidia
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app.kubernetes.io/instance=nvidia-device-plugin
           app.kubernetes.io/name=nvidia-device-plugin
  Containers:
   nvidia-device-plugin-ctr:
    Image:      nvcr.io/nvidia/k8s-device-plugin:v0.15.0
    Port:       <none>
    Host Port:  <none>
    Command:
      nvidia-device-plugin
    Environment:
      MPS_ROOT:                    /run/nvidia/mps
      NVIDIA_MIG_MONITOR_DEVICES:  all
      NVIDIA_VISIBLE_DEVICES:      all
      NVIDIA_DRIVER_CAPABILITIES:  compute,utility
    Mounts:
      /dev/shm from mps-shm (rw)
      /mps from mps-root (rw)
      /var/lib/kubelet/device-plugins from device-plugin (rw)
      /var/run/cdi from cdi-root (rw)
  Volumes:
   device-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:  
   mps-root:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia/mps
    HostPathType:  DirectoryOrCreate
   mps-shm:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia/mps/shm
    HostPathType:  
   cdi-root:
    Type:               HostPath (bare host directory volume)
    Path:               /var/run/cdi
    HostPathType:       DirectoryOrCreate
  Priority Class Name:  system-node-critical
  Node-Selectors:       <none>
  Tolerations:          CriticalAddonsOnly op=Exists
                        nvidia.com/gpu:NoSchedule op=Exists
Events:                 <none>

zhewenhu avatar Jun 14 '24 07:06 zhewenhu