gpu pod Pending
I’m trying to prepare GPU worker nodes and enable GPU support on Kubernetes to use GPU nodes. I followed the steps in the README file link , but the pod always remains pending and is not working.Itried to use cuda 10 as tuto and also i changed to 12 and always not working.
1. Quick Debug Information
- OS/Version : Ubuntu 22.04.4 LTS (Jammy Jellyfish)
- cuda version : 12.2
- NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 *server type : Nvidia L40S : link
- Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): Docker version 27.1.1, build 6312585
- Docker Compose version v2.29.1
- CRI-O version: 1.24.6
- nvidia-container-toolkit version (1.16.0-1).
- kubectl version : Client Version: v1.30.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.0
- minikube version: v1.33.1
- helm Version:"v3.15.3"
2. Issue or feature description
Events: Type Reason Age From Message
Warning FailedScheduling 26m (x150 over 12h) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
3.
kubectl get pods NAME READY STATUS RESTARTS AGE gpu-demo-vectoradd 0/1 Pending 0 12h gpu-operator-test 0/1 Pending 0 13h gpu-operator-test1 0/1 Pending 0 13h gpu-pod 0/1 Pending 0 13h
`kubectl describe pod gpu-pod
Name: gpu-pod
Namespace: default
Priority: 0
Service Account: default
Node:
IPs:
Warning FailedScheduling 26m (x150 over 12h) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.`
Did you deploy nvidia-device-plugin via helm? If so, which helm chart are you using? I am currently facing the same problem after upgrading from 0.14.0 to 0.16.1.
@imenselmi / @FelixMertin could you please provide the logs for the k8s-device-plugin device-plugin container?
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
This issue was automatically closed due to inactivity.