kueue [WIP][DRAFT] Add WorkloadSlice Support to Enable Mutable Workloads

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces the foundational implementation of WorkloadSlices in Kueue, as proposed in KEP-77. WorkloadSlices enable controlled scaling of admitted workloads (e.g., scale-up) while preserving Kueue's scheduling guarantees and resource tracking semantics.

📌 Summary

Introduces WorkloadSlice concept as a transient workload object representing a logical scale-up request.
Enables mutable workload behavior using a dual-Workload model:
- Original admitted workload.
- A new WorkloadSlice with additional requested capacity (1:2 state during transition).
Admission of the new WorkloadSlice triggers preemption of the original Workload, even if additional capacity is available, to enforce consistent admission-state transitions.
Uses Pod scheduling gates (instead of spec.suspend) for gating new pods until the slice is admitted.
Defaulting logic enables the feature automatically for supported jobs (e.g., batchv1.Job, RayJob).
Ensures all new pods created during the transition are gated until the corresponding Workload is admitted.
Aggregates admission state and lifecycle management into the core Workload controller flow.

📎 Additional Notes

Fully backward-compatible with existing single-Workload flow.
Includes tests for:
- Slice creation logic.
- Admission/preemption interaction.
- Scheduling gate behavior.
Documentation and KEP link updates to follow in separate PR.

⚠️ Known Limitations

Multi-cluster support for WorkloadSlices is still a work in progress and will be addressed in this or follow-up PRs.

Which issue(s) this PR fixes:

Fixes #5528

Special notes for your reviewer:

Does this PR introduce a user-facing change?

This change introduces the foundational support for WorkloadSlices in accordance with KEP-77. The initial implementation targets batch/v1.Job, enabling horizontal scaling through slice-based workload admission, scheduling gate control, and slice preemption handling.

Jun 05 '25 05:06 ichekrygin

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: ichekrygin / name: Illya Chekrygin (4e08884aeb7b41570a86721f3fd77ed515564148, 143d5c502a1ae33f028bd4ab0abc13b74fa96bcf, f63aa55be1867809eb9c78485f0e151a9f40d7a6, fdf77174c843be8816874d5fa9de0b2cf80d1e2d, d9255a2c3e4184995becf1e59dbb01876691aaed, fc49d7acaef9fbd511625e2279177c8d968b14b9, dd4ce880e547e76688a279af96d3df1829842551, 99e54230a14d479ac3ee4655062b7c08563e6e4a, f2869e420a026d1cbddaba082e8a25c9a9dc0583, d8e115fde460f8c5bbc2c0831502e0fa8d37f08f, 9bbccd3addf850cf6f1358565f1e12f36316c514, 0313da532bedfc9a45c2f0480d1d79e9308fd2e3, cabcd8e169535f6fafec72a79e778cff86088f3d, dafd141470cecab6904e08bb9a833cb15e182dcc, 4494d7853f9988640fa3a8d85c6eea1a5bbfa173, fe51aed7019d7c769c96070cf6fe3e1c1ea1981a, df97d77d3bb435207c320c9fa59cb71f6adf791a, de5aaa5cc7d7b94544d9e3cbeb421c6eb2d221e7, ebb7b69164b1b134254aa15bce08f3b3c8535404, c0e72efa2e49052510ad745da425bfcb6cec06bb, c60cfdb8fdd99062ca7d3b2992d3bceb6deb3cfb, af2da4d458e3efa1ec2ac114b2ecd20c190b3bf3, 870705898e4e996891afc82aa300379f8260c150, 11847419c0a9fe2ad73105bac65f6077c4e49808, 75c4f507ebfdc4b8bdc0a23fb0eaacf535b39c68, 0241eb73cf0d63292b8cec0e7507117adf7398ad, 7bf19d61c7cb14646ab8760f5146496a369e6805, 9f1096a0191424824a0134c8eccfff46c1271c8b, 76e5216474fee3a93d5dca27fee0cd29e61b9f56, f1d44095fcd85f0c1c394c6a32faf5cdbeccab30, 0509da1a08f9d406b5a8f79bc59bdb2f4027795a, 5ecbc024a32fdf63ee985bd88b729a75572d31e6, d8164cc56a2e17653d3e1139a9b55723cacf5225, 29e607c8685640418789aa79b2b2ba4bec953e2e, 922a8a03641d71228a5a2cd54029a2c69c57ab0a, 3afbbf4aebb38f547a15062b0f9dcec4b6a9f5a8, 083e71ca42c511d04d90f3d35c840237d0a52795, dcf49353764c4dba5a8923838d188055ddefc32a, a29432a3c7d34fd5282f74bf7f6a8893a9f06f9f, af552b375b29c30df29c7d6b63cf1f3786d1de89, aa2aefd5a4ebe29572b3957a596ab0ede8c8f752)

Jun 05 '25 05:06 linux-foundation-easycla[bot]

Hi @ichekrygin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jun 05 '25 05:06 k8s-ci-robot

Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
Latest commit	f1d44095fcd85f0c1c394c6a32faf5cdbeccab30
Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68807c94966dc0000839ef27

Jun 05 '25 05:06 netlify[bot]

/ok-to-test

Jun 27 '25 18:06 mimowo

/retest

Jul 07 '25 19:07 ichekrygin

/retitle Support Elastic Jobs via WorkloadSlices Just to align with the recent naming in the KEP.

Jul 08 '25 17:07 mimowo

/retest

Jul 09 '25 22:07 ichekrygin

@mimowo, thank you for the detailed and insightful feedback. I reviewed, remedied, and/or replies to all your comments. PTAL when you get a moment.

Jul 10 '25 04:07 ichekrygin

Thank you @ichekrygin Im pretty confident about the PR as for alpha. Going forward I have a couple of thoughts we may tackle in future iterations, or even the first if you have time:

what happens if the old workload is preempted by another workload. Then we will end up with two Pending workloads. I'm wondering if it would be better to mark the old as Finished in this case.
Would the Scale Down support work day one for all other Job types like RayClauser or JobSet?
I would like to reverse control of unsuspending the pods by a dedicated controller. It would observe Workloads and then for an admitted workload find all the associated pods and ungate a given number of them. This could work for arbitrary Job types. Otherwise we need a lot of integration specific code. A similar problem is solved in TopologyUngater. I'm happy to drive the work on commonizing the approaches.

Jul 11 '25 06:07 mimowo

What happens if the old workload is preempted by another workload? Then we’ll end up with two Pending workloads. I’m wondering if it would be better to mark the old one as Finished in this case.

Good question. The old workload slice shows up as a preemption target because of how flavor assignment works. And yeah, it’s possible that multiple workloads being scheduled could try to preempt the same old slice.

That said, Kueue already handles overlapping preemption targets. If a workload’s preemption target was already finalized earlier in the same scheduling cycle, that workload just gets skipped, whether it’s a new slice or something else.

So if the old slice gets evicted by some workload other than the new one, the new slice will get skipped too, leaving it pending with the new (scaled-up) definition but not admitted. And the reverse is also true, if the new slice evicts the old one, any other workloads that were planning to preempt the old slice will get skipped.

This keeps us from ending up with multiple workloads depending on the same preemption target, and avoids running into conflicting Pending states.

Jul 12 '25 19:07 ichekrygin

I looked at https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/5510/pull-kueue-test-integration-baseline-main/1946075894124646400 at it seems like an urelated flake. Note that the integration test does not enable the new feature gate.

Let me retry, if confirmed, I will open an issue /test pull-kueue-test-integration-baseline-main

Opened: https://github.com/kubernetes-sigs/kueue/issues/6018

Jul 18 '25 10:07 mimowo

@ichekrygin I think this is very close to be mergable. I left a bunch of comments, mostly renames to use "replacing" terminology consistently, rather than preemptions, because the mechanism only marginally relies on preemptions.

it would also be great to add integration tests for the happy path. The release is on Friday, so we still have a bit of time to address the comments I think.

Feel free to also squash the commits. There are 33 of them, I highly doubt anyone would like to be traversing them :)

Jul 21 '25 17:07 mimowo

LGTM, but please address the remaining comments

Jul 22 '25 07:07 mimowo

Let's make the note a bit more user-oriented, I think the workload-slices replacement is more of a technical detail. Putting a link to KEP77 is probably enough for interested readers. /release-note-edit

Support for Elastic (Dynamically Sized Jobs) in Alpha as designed in [KEP-77](https://github.com/kubernetes-sigs/kueue/tree/main/keps/77-dynamically-sized-jobs). 
The implementation supports resizing (scale up and down) of batch/v1.Job and is behind the Alpha 
`ElasticJobsViaWorkloadSlices` feature gate. Jobs which are subject to resizing need to have the
`kueue.x-k8s.io/elastic-job` annotation added at creation time.

Jul 23 '25 07:07 mimowo

/lgtm /approev Thank you for your relentless work on KEP-77 and this implementation PR. This is one of the oldest and most anticipated KEPs in Kueue. While we still have a long way to go (e.g., support for other Job CRDs, MultiKueue, TAS), this is a huge milestone, and I'm very happy to get this in.

FYI @tenzen-y: Since the release is approaching and all of my comments have been addressed, I am merging this now to avoid potential conflicts with other PRs. I've taken extra care to ensure all new code is behind the alpha feature gate. Please feel free to add any further comments or open a new issue for follow-up items. I'm confident we can address them.

Jul 23 '25 07:07 mimowo

LGTM label has been added.

Git tree hash: b632836c18466b71df5bac3e1328c769844678ef

Jul 23 '25 07:07 k8s-ci-robot

/approve

Jul 23 '25 07:07 mimowo

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ichekrygin, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mimowo]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 23 '25 07:07 k8s-ci-robot

kueue kueue copied to clipboard

[WIP][DRAFT] Add WorkloadSlice Support to Enable Mutable Workloads

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

kueue
kueue copied to clipboard

Deploy Preview for kubernetes-sigs-kueue canceled.