kueue [MultiKueue] Report a ClusterQueue as inactive (misconfigured) if there is ProvReq used with MK

What would you like to be added:

Validation for ClusterQueue, if there is a MK and ProvReq admission check configured.

Why is this needed:

Provisioning nodes on the management cluster does not make sense. We want to fail fast, and warn user about possibly wasted money to scale-up the cluster.

Proposed approach:

Use a mechanism similar to the one here: https://github.com/kubernetes-sigs/kueue/pull/1635.

Apr 19 '24 14:04 mimowo

/assign @trasc /cc @alculquicondor

Apr 19 '24 14:04 mimowo

I reviewed https://github.com/kubernetes-sigs/kueue/pull/2047, and I think we could follow the pattern here.

The AdmissionCheck condition would be CompatibleWithMultiKueue, and the reason for inactive ClusterQueue could be AdmissionCheckNonCompatibleWithMultiKueue. We would do the check inside updateWithAdmissionChecks as for other checks.

Apr 26 '24 08:04 mimowo

The only problem is that the condition would be specific to MultiKueue. What if other checks need similar semantics against others?

I would rather sit on this one for now until we observe more admission checks, in-tree or out-of-tree.

Apr 26 '24 13:04 alculquicondor

What if other checks need similar semantics against others?

Right, this approach cannot be used for arbitrary pairs of admission checks. However, MultiKueue seems more than an admission check. For example, it has a global configuration in the config map link.

I would rather sit on this one for now until we observe more admission checks, in-tree or out-of-tree.

I see, but it can take a long time until we have other pairs of AdmissionChecks which don't like each other, and having some protection before graduating MK and ProvReq to Beta would be nice.

The approach using the existing mechanism should be very quick to implement, and if one day we have a more generic mechanism, developed for the needs of other AC pairs, then we could switch to it.

May 07 '24 16:05 mimowo

Let's wait and see

May 07 '24 17:05 alculquicondor

/assign

May 16 '24 16:05 vladikkuzn

/unassign

May 17 '24 07:05 vladikkuzn

/assign

Jul 07 '24 19:07 bouaouda-achraf

@mimowo I think we don't have a proper design for this. And it hasn't proved to be very useful. Should we close it?

Jul 08 '24 12:07 alculquicondor

I'm ok to close it until we revisit the design or some evidence for users running into this issue.

Jul 08 '24 13:07 mimowo

/close

Jul 08 '24 14:07 alculquicondor

@alculquicondor: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jul 08 '24 14:07 k8s-ci-robot

/reopen I believe with the recent changes (https://github.com/kubernetes-sigs/kueue/pull/3254) to make cache aware of the MultiKueue and ProvisioningRequest AdmissionChecks we can easily validate this conditions. cc @mbobrovskyi @mszadkow

Dec 06 '24 15:12 mimowo

@mimowo: Reopened this issue.

In response to this:

/reopen I believe with the recent changes (https://github.com/kubernetes-sigs/kueue/pull/3254) to make cache aware of the MultiKueue and ProvisioningRequest AdmissionChecks we can easily validate this conditions. cc @mbobrovskyi @mszadkow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Dec 06 '24 15:12 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 06 '25 16:03 k8s-triage-robot

/remove-lifecycle stale

Mar 06 '25 16:03 mimowo

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 04 '25 16:06 k8s-triage-robot

Hitting this and unsure why this is happening. Any insights?

Kubectl describe workload

Status:
  Conditions:
    Last Transition Time:  2025-06-16T21:48:02Z
    Message:               ClusterQueue cluster-queue is inactive
    Observed Generation:   1
    Reason:                Inadmissible
    Status:                False
    Type:                  QuotaReserved

k get clusterqueue
NAME            COHORT   PENDING WORKLOADS
cluster-queue            31

k get localqueue
NAME               CLUSTERQUEUE    PENDING WORKLOADS   ADMITTED WORKLOADS
multislice-queue   cluster-queue   31                  0

Jun 16 '25 21:06 samos123

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jul 16 '25 22:07 k8s-triage-robot

/remove-lifecycle rotten

Aug 07 '25 08:08 mimowo

@samos123 what is your CQ configuration? Maybe provide the entire kubectl describe for the CQ

Aug 07 '25 08:08 mimowo

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 05 '25 08:11 k8s-triage-robot

/remove-lifecycle stale

Nov 05 '25 08:11 mimowo

kueue kueue copied to clipboard

[MultiKueue] Report a ClusterQueue as inactive (misconfigured) if there is ProvReq used with MK

kueue
kueue copied to clipboard