Queue resources are sufficient, but the pod has not been created, and the job has been pending.
What happened: 队列资源足够,但是并没有创建pod,任务一直pending。
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
queue yaml: apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: creationTimestamp: "2023-11-22T08:20:13Z" generation: 1 name: tmp-nv3090-13 resourceVersion: "4125550" uid: e71ea131-c3c5-420e-b611-b2a6930912f4 spec: capability: cpu: "99999" memory: 999999Gi momenta.ai/rdma: "9999" nvidia.com/gpu: "9999" reclaimable: false weight: 1 status: allocated: cpu: "0" memory: "0" pending: 1 reservation: {} state: Open
event: queue resource quota insufficient 0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable
Environment:
- Volcano Version:
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others:
我把队列中配置中的 momenta.ai/rdma: "9999" nvidia.com/gpu: "9999" 这两个去掉后,任务才正常开始调度,这是为什么呢。
What's the scheduler config and job yaml?
也遇到了相同的情况,有时重启一下又好了。现在有什么好的解决办法了吗?
Hello 👋 Looks like there was no activity on this issue for last 180 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 90 days, this issue will be closed (we can always reopen an issue if we need!).