kueue
kueue copied to clipboard
Update queueing strategy API comments for the 2-phase Admission
What happened: The SecondPass scheduling mechanism takes head workloads from ClusterQueue heads and the second pass queue in https://github.com/kubernetes-sigs/kueue/blob/5665ed27860264ee482c7270fa8627b8e2719a73/pkg/queue/manager.go#L650-L651.
For the BestEffort FIFIO strategy, this sounds reasonable since the strategy allows Kueue to admit following workloads when the head workload is inadmissible.
However, in the StrictFIFO strategy, if there are any workloads in the second pass queue, queueManager should not take the following workloads from the clusterQueue to guarantee the order strictly.
What you expected to happen: In the strict FIFO strategy mode, Kueue must always admit the head workload first.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): - Kueue version (use
git describe --tags --dirty --always): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release): - Kernel (e.g.
uname -a): - Install tools:
- Others:
I'm not understanding why this is a bug.
I think BestEffortFIFO and StrictFIFO are meant to control quota reservation. In case of the second pass of scheduling the workloads already have quota reserved. Note that, even before introduction of the second pass of scheduling the admission of workload was gated by the AdmissionChecks, and workload would be admitted independently of BestEffort / StrictFIFO as soon as all checks were green. I think this is similar case here.
For example, even without the second scheduling pass, you can have a Provisioning flavor in the StrictFIFO queue, and two workloads in the queue: wl1 (queued first), wl2 (queued second).
- wl1 is queud, then wl2
- wl1 has quota reserved and starts to wait for AC
- wl2 has quota reserved and starts to wait for AC
- wl2 is admitted before wl1 because the AC provisioned first, before AC for wl1 (this is likely to happen, for example it requested fewer nodes)
- wl1 is admitted now once its AC provisioned
So, the order of getting quota was FIFO, but not admission.
So, the order of getting quota was FIFO, but not admission.
Yes, that's right for AC case. Especially, in the step 5, wl1 will be admitted by workload-controller instead of scheduler since the quota reserved workload is not considered as an inadmissible workload.
I assumed that SecondPass scheduling admission (not quota reservation) still depends on FIFO. However, if we can define the queueing strategy is respected only for quota reservation, not admission, we should update the API comment and documentations:
- https://github.com/kubernetes-sigs/kueue/blob/adee4646e2d3a4dc92597388cfa527b72e13647a/apis/kueue/v1beta1/clusterqueue_types.go#L81-L94
- https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#queueing-strategy
/remove-kind bug /kind cleanup
/retitle Update queueing strategy API comments for the 2-phase Admission
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale