kueue Support automatic quota detection in ClusterQueue based on available nodes

What would you like to be added:

I would like to create a ClusterQueue resource that automatically contains quotas based on available node capacity in my Kubernetes cluster. I have configured my Kubernetes cluster with autoscaling and specify max nodes.

Why is this needed:

Caculating and adjusting quotas manually for ClusterQueue is toilsome. Often the total quota of a cluster is managed in the form of max nodes in a node pool. It would be great if the quota for a ClusterQueue can automatically detect total available capacity and dynamically set quotas for resources.

Completion requirements:

This enhancement requires the following artifacts:

[X] Design doc
[X] API change
[X] Docs update

The artifacts should be linked in subsequent comments.

Oct 02 '24 20:10 andrewsykim

This would only work for clusters wtih a single ClusterQueue, but I think it would still be useful.

An alternative approach is allowing quotas to specify percentages instead of strict resource quantites

Oct 02 '24 20:10 andrewsykim

Thank you for opening the discussion; this is definitely on our radar.

For autoscaled environments, Kueue would need to learn about the max-nodes configuration and understand the node resources to automatically adjust ClusterQueue quotas. I'm not sure we have a readily available API (like CA CRDs) to read this information from so it may require preparatory work in CA. It requires some exploration.

For non-autoscaling environments, we're working on Topology-Aware Scheduling. Part of this feature involves scraping node capacities, effectively limiting the quota based on the currently available nodes. We are not planning to support CA in the first iteration of TAS, but may revisit in the future iterations.

Expressing quotas as percentages within a cohort sounds useful to reduce the manual toil, and could be done as an independent feature. This concept is similar to the P&F (Priority and Fairness) configuration, with parameters like lendablePercent and borrowingLimitPercent.

/cc @mwielgus

Oct 03 '24 06:10 mimowo

Thanks for the reply!

Kueue would need to learn about the max-nodes configuration and understand the node resources to automatically adjust ClusterQueue quotas.

Do we need to the max-nodes configuration from CA? Could we instead just watch for new nodes and dynamically adjust the quotas? I guess the challenge with either approach is there will be Pods on every ndoe (DaemonSets), that won't necessarily consume quotas from ClusterQueue, we would need user input to know how much resources from each node can be allocated to the quota. This would be not that different from supporting quotas as percentages

Oct 03 '24 15:10 andrewsykim

Could we instead just watch for new nodes and dynamically adjust the quotas?

This is pretty much the approach we take in Topology Aware Scheduling (TAS), but in autoscaling environments you don't have the nodes until the workload is admitted (so kind of a chicken and egg problem?).

I guess the challenge with either approach is there will be Pods on every ndoe (DaemonSets), that won't necessarily consume quotas from ClusterQueue, we would need user input to know how much resources from each node can be allocated to the quota

Yeah, for TAS we plan to scrape the information about the Pod usage from DaemonSets. Either by watching DaemonSets or Pods directly.

Oct 03 '24 15:10 mimowo

Thanks @mimowo, great to hear you're already thinking about this

Oct 03 '24 17:10 andrewsykim

An alternative approach is allowing quotas to specify percentages instead of strict resource quantites

For this idea, you can achieve something very close to it with fair sharing. You can basically have a cohort, and assign fair sharing weights to the ClusterQueues within the cohort. The weights denote for priorities, and would translate for "percentages". For example, if you have 3 CQs, and you want them to share load in roughly 10%,20%,70% proportions you can assign the fair sharing weights as: 10, 20, 70. One reason these are weights rather than percenates is that it allows to mutating the values and the set of CQs without violating the sum to 100.

Oct 11 '24 09:10 mimowo

As alternatively, I'm wondering if we can obtain the simulation result from the CA. Because AFAIK, the CA has Pod scheduling simulation mechanism, and the CA will resize the cluster based on the simulation result, right?

So, I'm curious if we can obtain the Pod Scheduling simulation result and auto-adjust the CQ configuration.

But, for the solution, we may need to pay massive development costs, I guess...

Oct 11 '24 12:10 tenzen-y

For this idea, you can achieve something very close to it with fair sharing

@mimowo wouldn't you still need to assign total nominal quota to match the available capacity somewhere so Kueue knows what quota is available to borrow in the cohort? If nominal quota is higher than capacity, pods risk getting admitted but staying stuck in unschedulable, but it quota is lower than capacity Kueue won't admit workloads to make use of the full available capacity. Wonder if I'm missing something here.

Jan 07 '25 16:01 CecileRobertMichon

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 07 '25 17:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 07 '25 18:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jun 06 '25 19:06 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jun 06 '25 19:06 k8s-ci-robot

kueue kueue copied to clipboard

Support automatic quota detection in ClusterQueue based on available nodes

kueue
kueue copied to clipboard