enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Mutable Container Resources when Job is suspended

Open kannon92 opened this issue 6 months ago β€’ 21 comments

Enhancement Description

  • One-line enhancement description (can be used as a release note): Allow for mutating container requests/limits on a PodTemplate when job is suspended

  • Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/5440-mutable-job-pod-resource-updates

  • Discussion Link: https://github.com/kubernetes/kubernetes/issues/132436

  • PRs by stage and milestone:

    • [x] Alpha - v1.35
      • [x] KEP (k/enhancements) update PR(s):
        • https://github.com/kubernetes/enhancements/pull/5441
      • [x] Code (k/k) update PR(s):
        • https://github.com/kubernetes/kubernetes/pull/132441
      • [x] Docs (k/website) update PR(s):
        • https://github.com/kubernetes/website/pull/52800
  • [ ] Beta - v1.36

    • [ ] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/5703
    • [x] Code (k/k) update PR(s):
      • https://github.com/kubernetes/kubernetes/pull/135381
    • [ ] Docs (k/website) update(s):
  • [ ] Stable - v1.xx

    • [ ] KEP (k/enhancements) update PR(s):
    • [ ] Code (k/k) update PR(s):
    • [ ] Docs (k/website) update(s): -->

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

kannon92 avatar Jun 28 '25 17:06 kannon92

/sig apps

kannon92 avatar Jun 28 '25 17:06 kannon92

So I have a POC and an enhancement with this for device plugins and classic resources (cpu/memory).

I realize that more work is needed for DRA.

Right now, DRA does not allow modification of a ResourceClaimTemplate once it is created.

And you are also not able to change the claim on a PodTemplate.

I am not sure the path forward for DRA though.

Option 1:

  • Someone can create a new ResourceClaimTemplate. Patch workload to point to new ResourceClaimTemplate.
  • This is at least audit-able from api logs as user must patch workload.

Option 2:

  • Someone modifies ResourceClaimTemplate.
  • Workload would reflect updated ResourceClaimTemplate.
  • Workload author can't really control if someone changes this under them?

kannon92 avatar Jul 07 '25 15:07 kannon92

I met with wg-device-management about DRA.

For now, one solution if there are no pods in use, is to recreate the resourceClaimTemplate with update requirements.

With this we don't need to relax immutability for claims.

kannon92 avatar Jul 08 '25 18:07 kannon92

@soltysh @janetkuo I would like to try and get this in for 1.35 if possible.

kannon92 avatar Sep 19 '25 13:09 kannon92

/label lead-opted-in /milestone v1.35 /stage alpha

soltysh avatar Sep 19 '25 13:09 soltysh

/assign @kannon92

soltysh avatar Sep 19 '25 13:09 soltysh

/label lead-opted-in /milestone v1.35 /stage alpha

In the KEP, I was actually thinking of starting this as beta on.

Following https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/2926-job-mutable-scheduling-directives/kep.yaml I was thinking this would be fine as there is not an API change and I'm mostly relaxing validation logic.

But of course this is up to the SIG.

kannon92 avatar Sep 19 '25 13:09 kannon92

/label lead-opted-in /milestone v1.35 /stage alpha

In the KEP, I was actually thinking of starting this as beta on.

Following https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/2926-job-mutable-scheduling-directives/kep.yaml I was thinking this would be fine as there is not an API change and I'm mostly relaxing validation logic.

But of course this is up to the SIG.

I think that we can evaluate the risk and advantages in KEP. Then, we might select either alpha or beta stage.

tenzen-y avatar Sep 19 '25 15:09 tenzen-y

Hello @kannon92 πŸ‘‹, v1.35 Enhancements team here.

This is a reminder of the upcoming PRR freeze on Thursday 9th October 2025 (AoE) / Friday 10th October 2025, 12:00 UTC.

This enhancement is targeting stage alpha for v1.35 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [x] PR open or merged with the KEP's PRR questionnaire filled out.

  • [ ] PR open or merged with kep.yaml updated with the stage, latest-milestone, and milestone struct filled out.

  • [ ] PR open or merged with a PRR approval file with the PRR approver listed for the stage the KEP is targeting.

For this KEP, we would just need to update the following:

  • please update kep.yaml with latest information
  • update the prr approval file

Note that the PRs are not required to be approved or merged by the PRR freeze deadline. Having the PRR questionnaire filled out by the deadline will help ensure that the PRR team has enough time to review your KEP before enhancements freeze on Thursday 16th October 2025 (AoE) / Friday 17th October 2025, 12:00 UTC. For more information on the PRR process, see here.

The status of this enhancement is marked as At risk for PRR freeze. Please keep the issue description up-to-date with appropriate stages as well.

If you anticipate missing PRR freeze, you can file an exception request in advance. Thank you!

whtssub avatar Oct 06 '25 04:10 whtssub

Hello @kannon92 πŸ‘‹, v1.35 Enhancements team here.

This is a reminder of the upcoming PRR freeze on Thursday 9th October 2025 (AoE) / Friday 10th October 2025, 12:00 UTC.

This enhancement is targeting stage alpha for v1.35 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [ ] [ ] PR open or merged with the KEP's PRR questionnaire filled out. [ ] [ ] PR open or merged with kep.yaml updated with the stage, latest-milestone, and milestone struct filled out. [ ] [ ] PR open or merged with a PRR approval file with the PRR approver listed for the stage the KEP is targeting.

For this KEP, we would just need to update the following:

  • please update kep.yaml with latest information
  • update the prr approval file

Note that the PRs are not required to be approved or merged by the PRR freeze deadline. Having the PRR questionnaire filled out by the deadline will help ensure that the PRR team has enough time to review your KEP before enhancements freeze on Thursday 16th October 2025 (AoE) / Friday 17th October 2025, 12:00 UTC. For more information on the PRR process, see here.

The status of this enhancement is marked as At risk for PRR freeze. Please keep the issue description up-to-date with appropriate stages as well.

If you anticipate missing PRR freeze, you can file an exception request in advance. Thank you!

We are hopefully going directly to beta and skipping alpha.

I believe everything is addressed by https://github.com/kubernetes/enhancements/pull/5441.

PTAL if this is not the case.

kannon92 avatar Oct 06 '25 13:10 kannon92

With the PR open, this PR is now Tracked for PRR freeze. Thanks! /label tracked/yes

rayandas avatar Oct 09 '25 03:10 rayandas

Hello @kannon92 πŸ‘‹, v1.35 Enhancements team here.

Just checking in as we approach enhancements freeze on Thursday 16th October 2025 (AoE) / Friday 17th October 2025, 12:00 UTC.

This enhancement is targeting stage alpha for v1.35 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [X] KEP readme using the latest template has been merged into the k/enhancements repo.
  • [X] KEP status is marked as implementable for latest-milestone: v1.35. KEPs targeting stable will need to be marked as implemented after code PRs are merged.
  • [X] KEP readme has up-to-date graduation criteria.
  • [X] KEP has submitted a production readiness review request for approval and has a reviewer assigned.
  • [X] KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here).

With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. πŸš€

The status of this enhancement is marked as Tracked for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

whtssub avatar Oct 14 '25 07:10 whtssub

Hello @kannon92 :wave:, v1.35 Docs Shadow here.

Does this enhancement work planned for v1.35 require any new docs or modification to existing docs?

If so, please follow the steps here to open a PR against dev-1.35 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 23th October 2025.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

kernel-kun avatar Oct 16 '25 06:10 kernel-kun

@kernel-kun I opened up https://github.com/kubernetes/website/pull/52800 for documentation.

kannon92 avatar Oct 19 '25 20:10 kannon92

Hi @kannon92 πŸ‘‹ -- this is Chad (@chadmcrowell) from the 1.35 Communications Team!

For the 1.35 release, we are currently in the process of collecting and curating a list of potential feature blogs, and we'd love for you to consider writing one for your enhancement!

As you may be aware, feature blogs are a great way to communicate to users about features which fall into (but not limited to) the following categories:

  • This introduces some breaking change(s)
  • This has significant impacts and/or implications to users
  • ...Or this is a long-awaited feature, which would go a long way to cover the journey more in detail πŸŽ‰

To opt in to write a feature blog, could you please let us know and open a "Feature Blog placeholder PR" (which can be only a skeleton at first) against the website repository by Friday, the 31st of October? For more information about writing a blog, please find the blog contribution guidelines πŸ“š

[!Tip] Some timeline to keep in mind:

  • 12:00 UTC Friday, 31st October: Feature blog PR freeze
  • Friday, 21st November: Feature blogs ready for review
  • You can find more in the release document

[!Note] In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.

chadmcrowell avatar Oct 21 '25 10:10 chadmcrowell

Hi @kannon92 πŸ‘‹, 1.35 Communications Team here again!

This is a gentle reminder for the feature blog deadline mentioned above, which is 12:00 UTC Friday, 31st October. To opt in, please let us know and open a Feature Blog placeholder PR against k/website by the deadline. If you have any questions, please feel free to reach out to us!

[!Tip] Some timeline to keep in mind:

  • 12:00 UTC Friday, 31st October: Feature blog PR freeze
  • Friday, 21st November: Feature blogs ready for review
  • You can find more in the release document

[!Note] In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.

chadmcrowell avatar Oct 27 '25 11:10 chadmcrowell

Hey again @kannon92 πŸ‘‹, v1.35 Enhancements team here,

Just checking in as we approach code freeze and test freeze on Thursday 6th November 2025 (AoE) / Friday 7th November 2025, 12:00 UTC.

Here's where this enhancement currently stands:

  • [x] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • [ ] All PRs are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

Per the issue description, these are all of the implementation (code-related) PRs for v1.35, some of which are not merged yet:

  • https://github.com/kubernetes/kubernetes/pull/132441
  • https://github.com/kubernetes/kubernetes/pull/134629

Please let me know (and keep the issue description updated) if there are any other PRs in k/k that we should track for this KEP, so that we can maintain accurate status.

If the implementation work for this enhancement is occurring out-of-tree (i.e., outside of k/k), please link the relevant PRs in the issue description for visibility. Alternatively, if you're unable to provide specific PR links, a confirmation that all out-of-tree implementation work is complete and merged will help us finalize tracking and maintain accuracy.

The status of this enhancement is marked as At risk for code freeze.

If you anticipate missing code freeze, you can file an exception request in advance.

whtssub avatar Nov 03 '25 04:11 whtssub

Hello @kannon92 πŸ‘‹, v1.35 Enhancements team here.

With all the implementation (code-related) PRs merged per the issue description:

This enhancement is now marked as Tracked for code freeze for the v1.35 Code Freeze!

Please note that KEPs targeting stable need to have the status field marked as implemented in the kep.yaml file after code PRs are merged.

/label tracked/yes

whtssub avatar Nov 07 '25 12:11 whtssub

@whtssub: The label(s) /label tracked/yes1.35 cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full, lead-opted-in, tracked/no, tracked/out-of-tree, tracked/yes. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

Hello @kannon92 πŸ‘‹, v Enhancements team here.

With all the implementation (code-related) PRs merged per the issue description:

This enhancement is now marked as Tracked for code freeze for the v1.35 Code Freeze!

Please note that KEPs targeting stable need to have the status field marked as implemented in the kep.yaml file after code PRs are merged.

/label tracked/yes1.35

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Nov 07 '25 12:11 k8s-ci-robot

Hello @kannon92 πŸ‘‹, v1.35 Docs Team here again!

Please take a look at Documenting for a release - PR Ready for Review to get your Docs PR ready for review before Tuesday 18th November 2025.

Please let us know once your PR is fully Ready for Review β€” meaning all documentation updates are complete and it’s awaiting reviewer feedback β€” so we can update our tracking.

Thank you!

kernel-kun avatar Nov 09 '25 16:11 kernel-kun

With docs https://github.com/kubernetes/website/pull/52800 merged, this KEP is now tracked for the docs freeze.

kernel-kun avatar Dec 01 '25 18:12 kernel-kun