volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Queue resources are sufficient, but the pod has not been created, and the job has been pending.

Open qutianhang1 opened this issue 2 years ago • 3 comments

What happened: 队列资源足够,但是并没有创建pod,任务一直pending。

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

queue yaml: apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: creationTimestamp: "2023-11-22T08:20:13Z" generation: 1 name: tmp-nv3090-13 resourceVersion: "4125550" uid: e71ea131-c3c5-420e-b611-b2a6930912f4 spec: capability: cpu: "99999" memory: 999999Gi momenta.ai/rdma: "9999" nvidia.com/gpu: "9999" reclaimable: false weight: 1 status: allocated: cpu: "0" memory: "0" pending: 1 reservation: {} state: Open

event: queue resource quota insufficient 0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable

Environment:

  • Volcano Version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

qutianhang1 avatar Nov 22 '23 08:11 qutianhang1

我把队列中配置中的 momenta.ai/rdma: "9999" nvidia.com/gpu: "9999" 这两个去掉后,任务才正常开始调度,这是为什么呢。

qutianhang1 avatar Nov 22 '23 15:11 qutianhang1

What's the scheduler config and job yaml?

Monokaix avatar Nov 24 '23 03:11 Monokaix

也遇到了相同的情况,有时重启一下又好了。现在有什么好的解决办法了吗?

jorahbi avatar May 30 '24 10:05 jorahbi

Hello 👋 Looks like there was no activity on this issue for last 180 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 90 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Apr 25 '25 23:04 stale[bot]