enhancements
enhancements copied to clipboard
DRA: Handle extended resource requests via DRA Driver
Enhancement Description
-
One-line enhancement description (can be used as a release note): Allow DRA drivers to honor requests made via the extended resource API (e.g.
nvidia.com/gpu: 2) rather than requiring a standard device plugin be used. -
Kubernetes Enhancement Proposal:
- https://github.com/kubernetes/enhancements/pull/5136
- Incremental PRs:
- TBD
-
Discussion Link:
- https://youtu.be/fKhX_lHK8Z0?si=gq5kIFHP3ve2TXyE&t=1822
-
Primary contact (assignee): @klueska, @pohly, @johnbelamaric
-
Responsible SIGs: /sig node /wg device-management
-
Enhancement target (which target equals to which milestone):
- Alpha release target: 1.34
- Beta release target: 1.35
- Stable release target: 1.36
-
[ ] Alpha
- [x] KEP (
k/enhancements) update PR(s):- https://github.com/kubernetes/enhancements/pull/5136
- [ ] Code (
k/k) update PR(s):- TBD
- [ ] Docs (
k/website) update PR(s):- TBD
- [x] KEP (
+1 yes please!
We need to sort out the requirements. A few initial questions:
- For newly created pods, I think it's clear we want this to be transparent. Existing manifests that use the extended resource API should continue to work as before, without modification.
- Can we handle this invisibly in the driver layer, or do we need to have DRA invoked at the control plane level and select the specific devices? If we don't, we will likely have a race condition - unless the scheduler can do some magical accounting (which seems possible).
- How do we handle upgrades? If we have a node running device plugin, and we switch to the DRA driver (or we upgrade to a driver that supports both), do you have to delete the pods? Do they automatically adopt the devices? If so, how do we write those back to the allocation logic (since no DRA claim exists).
- What happens if there are pods in a deployment, and some land on nodes with device plugin and some with DRA drivers?
- We talked about letting specific device classes be advertised as specific extended resources. This could mean the existing resource names get mapped to specific device classes by the admin. It could also mean we have a convention like
deviceclass.k8s.io/foo: 4for extended resource names. How do these choices interplay with the questions above?
Can each dra-driver implement a webhook to create a ResourceClaimTemplate after creating a pod and modify the application method of resources in the pod?
@lengrongfu that is what this KEP would be designed to avoid. There would be integrated scheduler support for all drivers, rather than requiring each DRA driver to provide a webhook.
Open questions (from SIG Scheduling meeting):
- How to handle resource quotas
- Scheduling throughput (API requests and overall processing).
/cc
/cc
/sig scheduling
/assign @yliaog
Yu, I am assigning to you, let me know if that's OK
/label lead-opted-in /milestone v1.33
note: PRR freeze is tomorrow! you need to have a KEP update for this opened before then. Thanks!
/stage alpha
Hello @klueska @pohly @johnbelamaric @yliaog π, v1.33 Enhancements team here.
Just checking in as we approach enhancements freeze on 02:00 UTC Friday 14th February 2025 / 19:00 PDT Thursday 13th February 2025.
This enhancement is targeting stage alpha for v1.33 (correct me, if otherwise)
Here's where this enhancement currently stands:
- [ ] KEP readme using the latest template has been merged into the k/enhancements repo.
- [ ] KEP status is marked as
implementableforlatest-milestone: v1.32. - [ ] KEP readme has up-to-date graduation criteria
- [ ] KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline on Thursday 6th February 2025 so that the PRR team has enough time to review your KEP.
For this KEP, we would need to update the following:
- Create the KEP readme using the latest template and merge it in the k/enhancements repo.
- Ensure that the KEP has undergone a production readiness review and has been merged into k/enhancements.
The status of this enhancement is marked as At risk for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well
If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!
Hi @klueska @pohly @johnbelamaric @yliaog π, 1.33 Enhancements team here,
Just a quick friendly reminder as we approach the enhancements freeze later this week, at 02:00 UTC Friday 14th February 2025 / 19:00 PDT Thursday 13th February 2025.
The current status of this enhancement is marked as At risk for enhancement freeze. There are a few requirements mentioned in the comment https://github.com/kubernetes/enhancements/issues/5004#issuecomment-2639373202 that still need to be completed.
If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!
@dipesh-rawat we will be doing this in 1.34 instead - I updated the description above, can you do whatever else the release team needs to properly account for that?
/remove-label lead-opted-in /remove-milestone v1.33
I see that this issue has been opted-out of v1.33 and is now planned for a future release. I will go ahead and mark it as Deferred on the v1.33 board for tracking purposes - do let the enhancement team know otherwise.
/milestone clear
/milestone v1.34
/cc
/cc
Hi @klueska @pohly @johnbelamaric @yliaog π, v1.34 Enhancements Lead here.
It looks like this enhancement has been added to the v1.34 milestone, but doesn't yet have the lead-opted-in label. Just a gentle reminder that if you plan to work on this enhancement in v1.34, please have the SIG lead opt-in by adding the lead-opted-in label, which will ensure it gets added to the tracking board.
Thanks!
/label lead-opted-in
/cc
Hi @klueska @pohly @johnbelamaric @yliaog π, v1.34 Enhancements team here.
This is a reminder of the upcoming PRR Freeze on Thursday 12th June 2025.
By this date, there must be a PR open in k/enhancements with:
- The KEP's PRR questionnaire filled out.
- The kep.yaml updated with the
stage,latest-milestone, andmilestonestruct filled out. - A PRR approval file with the PRR approver listed for the stage the KEP is targeting.
Having the PRR questionnaire filled out by this deadline will help ensure that the PRR team has enough time to review your KEP before Enhancements Freeze on Friday 20th June 2025. For more information on the PRR process, see here.
/cc
https://github.com/kubernetes/enhancements/pull/5136 should mean we're ready for enhancements freeze
Hello again @klueska @yliaog :wave:, v1.34 Enhancements team here.
Just checking in as we approach enhancements freeze on 21:00 UTC Friday 20th June 2025 / 14:00 PST Friday 20th June 2025.
This enhancement is targeting stage alpha for v1.34 (correct me, if otherwise)
Hereβs where this enhancement currently stands:
- [x] KEP readme using the latest template has been merged into the k/enhancements repo.
- [x] KEP status is marked as
implementableforlatest-milestone: v1.34. - [x] KEP readme has up-to-date graduation criteria.
- [x] KEP has submitted a production readiness review request for approval and has a reviewer assigned.
- [x] KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here).
With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. :rocket:
The status of this enhancement is marked as Tracked for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well.
Thank you!
Hi @klueska, @pohly, @johnbelamaric π -- this is Agus (@aibarbetta) from the v1.34 Communications Team!
For the v1.34 release, we are currently in the process of collecting and curating a list of potential feature blogs, and we'd love for you to consider writing one for your enhancement!
As you may be aware, feature blogs are a great way to communicate to users about features which fall into (but not limited to) the following categories:
- This introduces some breaking change(s)
- This has significant impacts and/or implications to users
- ...Or this is a long-awaited feature, which would go a long way to cover the journey more in detail π
To opt in to write a feature blog, could you please let us know and open a "Feature Blog placeholder PR" (which can be only a skeleton at first) against the website repository by Friday 11th July? For more information about writing a blog, please find the blog contribution guidelines π
[!Tip] Some timeline to keep in mind:
- 02:00 UTC Friday 11th July 2025: Feature blog PR freeze
- Friday 8th August 2025: Feature blogs ready for review
- You can find more in the release document
[!Note] In your placeholder PR, use
XXcharacters for the blogdatein the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.
Hi @klueska, @pohly, @johnbelamaric π, 1.34 Docs Shadow here.
Does this enhancement work planned for 1.34 require any new docs or modification to existing docs? If so, please follows the steps here to open a PR against dev-1.34 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 3rd July 2025 18:00 PDT.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!
Hi @klueska, @pohly, @johnbelamaric π, 1.34 Docs Shadow here.
Does this enhancement work planned for 1.34 require any new docs or modification to existing docs? If so, please follows the steps here to open a PR against dev-1.34 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 3rd July 2025 18:00 PDT.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!
Just a reminder to open a placeholder PR against dev-1.34 branch in the k/website repo for this (steps available here) for this KEP if it requires new or modifications to existing docs:
The deadline for this is Thursday July 3 at 18:00 PDT. Thanks! π
Created a placeholder docs PR: https://github.com/kubernetes/website/pull/51485