kueue
kueue copied to clipboard
[MultiKueue] Report a ClusterQueue as inactive (misconfigured) if there is ProvReq used with MK
What would you like to be added:
Validation for ClusterQueue, if there is a MK and ProvReq admission check configured.
Why is this needed:
Provisioning nodes on the management cluster does not make sense. We want to fail fast, and warn user about possibly wasted money to scale-up the cluster.
Proposed approach:
Use a mechanism similar to the one here: https://github.com/kubernetes-sigs/kueue/pull/1635.
/assign @trasc /cc @alculquicondor
I reviewed https://github.com/kubernetes-sigs/kueue/pull/2047, and I think we could follow the pattern here.
The AdmissionCheck condition would be CompatibleWithMultiKueue, and the reason for inactive ClusterQueue could be AdmissionCheckNonCompatibleWithMultiKueue. We would do the check inside updateWithAdmissionChecks as for other checks.
The only problem is that the condition would be specific to MultiKueue. What if other checks need similar semantics against others?
I would rather sit on this one for now until we observe more admission checks, in-tree or out-of-tree.
What if other checks need similar semantics against others?
Right, this approach cannot be used for arbitrary pairs of admission checks. However, MultiKueue seems more than an admission check. For example, it has a global configuration in the config map link.
I would rather sit on this one for now until we observe more admission checks, in-tree or out-of-tree.
I see, but it can take a long time until we have other pairs of AdmissionChecks which don't like each other, and having some protection before graduating MK and ProvReq to Beta would be nice.
The approach using the existing mechanism should be very quick to implement, and if one day we have a more generic mechanism, developed for the needs of other AC pairs, then we could switch to it.
Let's wait and see
/assign
/unassign
/assign
@mimowo I think we don't have a proper design for this. And it hasn't proved to be very useful. Should we close it?
I'm ok to close it until we revisit the design or some evidence for users running into this issue.
/close
@alculquicondor: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/reopen I believe with the recent changes (https://github.com/kubernetes-sigs/kueue/pull/3254) to make cache aware of the MultiKueue and ProvisioningRequest AdmissionChecks we can easily validate this conditions. cc @mbobrovskyi @mszadkow
@mimowo: Reopened this issue.
In response to this:
/reopen I believe with the recent changes (https://github.com/kubernetes-sigs/kueue/pull/3254) to make cache aware of the MultiKueue and ProvisioningRequest AdmissionChecks we can easily validate this conditions. cc @mbobrovskyi @mszadkow
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Hitting this and unsure why this is happening. Any insights?
Kubectl describe workload
Status:
Conditions:
Last Transition Time: 2025-06-16T21:48:02Z
Message: ClusterQueue cluster-queue is inactive
Observed Generation: 1
Reason: Inadmissible
Status: False
Type: QuotaReserved
k get clusterqueue
NAME COHORT PENDING WORKLOADS
cluster-queue 31
k get localqueue
NAME CLUSTERQUEUE PENDING WORKLOADS ADMITTED WORKLOADS
multislice-queue cluster-queue 31 0
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
@samos123 what is your CQ configuration? Maybe provide the entire kubectl describe for the CQ
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale