sematic
sematic copied to clipboard
Add timeout if worker pod fails to schedule after a timeout period
Sometimes pods fail to schedule, and k8s has no way to tell you whether they ever will or not. We should fail after a certain amount of time of being stuck in this situation, and surface a meaningful error about why the pod failed to schedule
Example status
from pod yaml for a pod that can't be scheduled:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-10-13T22:53:47Z"
message: '0/3 nodes are available: 3 Insufficient cpu.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending