sematic icon indicating copy to clipboard operation
sematic copied to clipboard

Add timeout if worker pod fails to schedule after a timeout period

Open augray opened this issue 2 years ago • 1 comments

Sometimes pods fail to schedule, and k8s has no way to tell you whether they ever will or not. We should fail after a certain amount of time of being stuck in this situation, and surface a meaningful error about why the pod failed to schedule

augray avatar Oct 07 '22 16:10 augray

Example status from pod yaml for a pod that can't be scheduled:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-10-13T22:53:47Z"
    message: '0/3 nodes are available: 3 Insufficient cpu.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending

augray avatar Oct 17 '22 23:10 augray