In-Place Update of Pod Resources
Enhancement Description
- One-line enhancement description (can be used as a release note): This issue tracks a list of KEP review conversations that need resolving before we GA the feature.
- Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources
- Primary contact (assignee): @tallclair @Jeffwan @vinaykul
- Responsible SIGs: sig-node, sig-autoscaling
- Enhancement target (which target equals to which milestone):
- Alpha release target (1.27)
- Beta release target (past 1.33?)
- Stable release target (past )
- [x] Alpha(v1.27~v1.29)
- [x] KEP (k/enhancements) update PR(s):
- [x] 1.30 https://github.com/kubernetes/enhancements/pull/4433
- [x] v1.28 https://github.com/kubernetes/enhancements/pull/3944
- [x] https://github.com/kubernetes/enhancements/pull/4078
- [x] v1.17 initial https://github.com/kubernetes/enhancements/pull/686
- [x] Code (k/k) update PR(s):
- [x] v1.25 CRI: https://github.com/kubernetes/kubernetes/pull/111645
- [x] v1.27
- [x] https://github.com/kubernetes/kubernetes/pull/102884
- [x] https://github.com/kubernetes/kubernetes/pull/116119
- [x] https://github.com/kubernetes/kubernetes/pull/116271
- [x] https://github.com/kubernetes/kubernetes/pull/116351
- [x] https://github.com/kubernetes/kubernetes/pull/116450
- [x] https://github.com/kubernetes/kubernetes/pull/116504
- [x] https://github.com/kubernetes/kubernetes/pull/116684
- [x] https://github.com/kubernetes/kubernetes/pull/116702
- [x] https://github.com/kubernetes/kubernetes/pull/116857
- [x] v1.29
- [x] https://github.com/kubernetes/kubernetes/pull/119665
- [x] https://github.com/kubernetes/kubernetes/pull/118768
- [x] https://github.com/kubernetes/kubernetes/pull/117615
- [x] https://github.com/kubernetes/kubernetes/pull/112599
- [x] https://github.com/kubernetes/kubernetes/pull/120145
- [x] Docs (k/website) update PR(s):
- [x] https://github.com/kubernetes/website/pull/39846
- [x] https://github.com/kubernetes/website/pull/39845
- [x] KEP (k/enhancements) update PR(s):
- [ ] Beta(v1.33)
- [x] KEP (k/enhancements) update PR(s):
- [x] 1.32 https://github.com/kubernetes/enhancements/pull/4704
- [x] 1.33 https://github.com/kubernetes/enhancements/pull/5089
- [x] Code (k/k) update PR(s):
- [x] https://github.com/kubernetes/kubernetes/pull/128771
- [x] https://github.com/kubernetes/kubernetes/pull/128718
- [x] https://github.com/kubernetes/kubernetes/pull/128713
- [x] https://github.com/kubernetes/kubernetes/pull/128694
- [x] https://github.com/kubernetes/kubernetes/pull/128687
- [x] https://github.com/kubernetes/kubernetes/pull/128683
- [x] https://github.com/kubernetes/kubernetes/pull/128676
- [x] https://github.com/kubernetes/kubernetes/pull/128680
- [x] https://github.com/kubernetes/kubernetes/pull/128623
- [x] https://github.com/kubernetes/kubernetes/pull/128598
- [x] https://github.com/kubernetes/kubernetes/pull/128551
- [x] https://github.com/kubernetes/kubernetes/pull/128518
- [x] https://github.com/kubernetes/kubernetes/pull/128377
- [x] https://github.com/kubernetes/kubernetes/pull/128296
- [x] https://github.com/kubernetes/kubernetes/pull/128287
- [x] https://github.com/kubernetes/kubernetes/pull/128269
- [x] https://github.com/kubernetes/kubernetes/pull/128266
- [x] https://github.com/kubernetes/kubernetes/pull/128186
- [x] https://github.com/kubernetes/kubernetes/pull/128143
- [x] https://github.com/kubernetes/kubernetes/pull/125708
- [x] https://github.com/kubernetes/kubernetes/pull/126620
- [x] https://github.com/kubernetes/kubernetes/pull/127300
- [x] https://github.com/kubernetes/kubernetes/pull/127291
- [x] https://github.com/kubernetes/kubernetes/pull/127275
- [x] https://github.com/kubernetes/kubernetes/pull/124216
- [x] https://github.com/kubernetes/kubernetes/pull/125757
- [x] https://github.com/kubernetes/kubernetes/pull/124227
- [x] https://github.com/kubernetes/kubernetes/pull/128123
- [x] https://github.com/kubernetes/kubernetes/pull/128367
- [x] https://github.com/kubernetes/kubernetes/pull/128685
- [x] https://github.com/kubernetes/kubernetes/pull/128719
- [x] https://github.com/kubernetes/kubernetes/pull/128920
- [x] https://github.com/kubernetes/kubernetes/pull/129216
- [x] https://github.com/kubernetes/kubernetes/pull/129477
- [x] https://github.com/kubernetes/kubernetes/pull/129717
- [x] https://github.com/kubernetes/kubernetes/pull/130183
- [x] https://github.com/kubernetes/kubernetes/pull/130254
- [x] https://github.com/kubernetes/kubernetes/pull/130559
- [x] https://github.com/kubernetes/kubernetes/pull/130574
- [x] https://github.com/kubernetes/kubernetes/pull/130599
- [x] https://github.com/kubernetes/kubernetes/pull/130733
- [x] https://github.com/kubernetes/kubernetes/pull/130880
- [x] https://github.com/kubernetes/kubernetes/pull/130902
- [x] https://github.com/kubernetes/kubernetes/pull/130905
- [x] https://github.com/kubernetes/kubernetes/pull/130917
- [x] https://github.com/kubernetes/kubernetes/pull/130831
- [x] Docs (k/website) update PR(s):
- [x] https://github.com/kubernetes/website/pull/50290
- [x] KEP (k/enhancements) update PR(s):
Please to keep this description up to date. This will help the Enhancement Team track efficiently the evolution of the enhancement
-
~~Identify CRI changes needed for UpdateContainerResources API, define response message for UpdateContainerResources~~
- ~~Extend UpdateContainerResources API to return info such as ‘not supported’, ‘not enough memory’, ‘successful’, ‘pending page evictions’ etc.~~
- ~~Define expected behavior for runtime when UpdateContainerResources is invoked. Define timeout duration of the CRI call.~~
- Resolution: Separate KEP for CRI changes.
- Discussed draft CRI changes with SIG-Node on Oct 22, and we agreed to do this as an incremental change outside the scope of this KEP, in a new mini-KEP. It does not block implementation of this KEP.
- Resolution: Separate KEP for CRI changes.
-
Define behavior when multiple containers are being resized, and UpdateContainerResources fails for one or more containers.
- One Possible solution:
- Do not update Status.Resources.Limits if UpdateContainerResources API fails, and keep retrying until it succeeds.
- One Possible solution:
-
~~Check with API reviewers if we can keep maps instead list of named sub-objects for ResizePolicy.~~
- After discussion with @liggitt , we are going to use list of named subobjects for extensibility.
-
Can we find a more intuitive name for ResizePolicy?
-
Can we use ResourceVersion to figure out the ordering of Pod resize requests?
-
~~Do we need to add back the ‘RestartPod’ resize policy? Is there a strong use-case for it?~~
- Resolution: No.
- Discussed with SIG-Node on Oct 15th, not adding RestartPod policy for simplicity, will revisit if we encounter problems.
- Resolution: No.
Alpha Feature Code Issues: These are Items and issues discovered during code review that need further discussion and need to be addressed before Beta.
- ~~Can we figure out GetPodQOS differently once it is determined on pod create? See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663280487~~
- Fixed by https://github.com/kubernetes/kubernetes/pull/119665
- How do we deal with a pod that requests 1m/1m cpu requests/limits. See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r662552642
- Add internal representation of ContainerStatus.Resources in kubeContainer. Convert it to ContainerStatus.Resources in kubelet_pods generate functions. See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r662534632 and https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663151422 and https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663300123
- Can we get rid of resize mutex? Is there a better way to handle resize retries? See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663160060
- Can we recover from resize checkpoint store failures? See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663245975
- CRI clarification for ContainerStatus.Resources and how to handle runtimes that don't support it. See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663300347
- ~~Add real values to dockershim test for ContainerStatus.Resources https://github.com/kubernetes/kubernetes/pull/102884#discussion_r662521121~~
- Resolution: Not required due to dockershim deprecation.
- ~~Change PodStatus.Resources from v1.ResourceRequirements to *v1.ResourceRequirements~~
- Resolution: Fixed
- Address all places in the code that has 'TODO(vinaykul)'
- Current implementation does not work with node toploogy manager enabled. This limitation is not capturedi in the KEP. Add this to the release documentation for alpha, we will address this in beta. See https://github.com/kubernetes/kubernetes/pull/102884#discussion_r676806049
/assign @vinaykul
👋 Hey there @vinaykul. I'm a shadow on the 1.17 Release Team, working on Enhancements. We're tracking issues for the 1.17 release and I wanted to reach out and ask we should track this (or more specifically I guess the In-Place Update of Pod Resources feature) for 1.17?
The current release schedule is:
Monday, September 23 - Release Cycle Begins Tuesday, October 15, EOD PST - Enhancements Freeze Thursday, November 14, EOD PST - Code Freeze Tuesday, November 22 - Docs must be completed and reviewed Monday, December 9 - Kubernetes 1.17.0 Released
We're only 5 days away from the Enhancements Freeze, so if you intend to graduate this capability in the 1.17 release, here are the requirements that you'll need to satisfy:
- KEP must be merged in
implementablestate - KEP must define graduation criteria
- KEP must have a test plan defined
Thanks @vinaykul
- KEP must be merged in
implementablestate- KEP must define graduation criteria
- KEP must have a test plan defined
Hi @jeremyrickard I'll do my best to get this KEP to implementable state by next Tuesday, but it looks like a stretch at this point - the major item is to complete API review with @thockin , and that depends on his availability.
The actual code changes are not that big. Nevertheless, the safe option would be to track this for 1.18.0 release, I'll update you by next Monday.
CC: @dashpole @derekwaynecarr @dchen1107
@jeremyrickard @mrbobbytables This KEP will take some more discussion - key thing is API review. It does not look like @thockin or another API reviewer is available soon. Could we please track this KEP for v1.18? Thanks,
/milestone v1.18
@PatrickLang Here's a first stab at the proposed CRI change to allow UpdateContainerResources to work with Windows. Please take a look.. let's discuss in tomorrow's sig meeting
root@skibum:~/km16/staging/src/k8s.io/cri-api# git diff --cached .
diff --git a/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto b/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto
index 0290d0f..b05bb56 100644
--- a/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto
+++ b/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto
@@ -924,14 +924,33 @@ message ContainerStatusResponse {
map<string, string> info = 2;
}
+// ContainerResources holds the fields representing a container's resource limits
+message ContainerResources {
+ // Resource configuration specific to Linux container.
+ LinuxContainerResources linux = 1;
+ // Resource configuration specific to Windows container.
+ WindowsContainerResources windows = 2;
+}
+
message UpdateContainerResourcesRequest {
// ID of the container to update.
string container_id = 1;
- // Resource configuration specific to Linux containers.
+ // Resource configuration specific to Linux container.
LinuxContainerResources linux = 2;
+ // Resource configuration specific to Windows container.
+ WindowsContainerResources windows = 3;
}
-message UpdateContainerResourcesResponse {}
+message UpdateContainerResourcesResponse {
+ // ID of the container that was updated.
+ string container_id = 1;
+ // Resource configuration currently applied to the Linux container.
+ LinuxContainerResources linux = 2;
+ // Resource configuration currently applied to the Windows container.
+ WindowsContainerResources windows = 3;
+ // Error message if UpdateContainerResources fails in the runtime.
+ string error_message = 4;
+}
message ExecSyncRequest {
// ID of the container.
diff --git a/staging/src/k8s.io/cri-api/pkg/apis/services.go b/staging/src/k8s.io/cri-api/pkg/apis/services.go
index 9a22ecb..9f1d893 100644
--- a/staging/src/k8s.io/cri-api/pkg/apis/services.go
+++ b/staging/src/k8s.io/cri-api/pkg/apis/services.go
@@ -44,7 +44,7 @@ type ContainerManager interface {
// ContainerStatus returns the status of the container.
ContainerStatus(containerID string) (*runtimeapi.ContainerStatus, error)
// UpdateContainerResources updates the cgroup resources for the container.
- UpdateContainerResources(containerID string, resources *runtimeapi.LinuxContainerResources) error
+ UpdateContainerResources(containerID string, resources *runtimeapi.ContainerResources) error
// ExecSync executes a command in the container, and returns the stdout output.
// If command exits with a non-zero exit code, an error is returned.
ExecSync(containerID string, cmd []string, timeout time.Duration) (stdout []byte, stderr []byte, err error)
@vinaykul It looks like since the above PR was merged, this was removed from the API review queue. I believe you need to open a new PR that moves the state to implementable, and then add the API-review label to get it back in the queue and get a reviewer.
Edit: you should also include any other changes (e.g. windows CRI changes) required to move the feature to implementable in the PR as well.
@vinaykul It looks like since the above PR was merged, this was removed from the API review queue. I believe you need to open a new PR that moves the state to implementable, and then add the API-review label to get it back in the queue and get a reviewer.
Edit: you should also include any other changes (e.g. windows CRI changes) required to move the feature to implementable in the PR as well.
@dashpole Thanks!
I've started a provisional mini-KEP per our discussion last week for the CRI changes (Dawn mentioned last week that we should take that up separately). imho the CRI changes does not block the implementation of this KEP, as it is between Kubelet and runtime, and user is not affected by it.
In a second commit to the same PR, I've addressed another key issue (update api failure handling), and requested change to move primary KEP to implementable.
With this, everything is in one place, and we can use it for API review.
Hey there @vinaykul -- 1.18 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating to alpha in 1.18?
The current release schedule is:
- Monday, January 6th - Release Cycle Begins
- Tuesday, January 28th EOD PST - Enhancements Freeze
- Thursday, March 5th, EOD PST - Code Freeze
- Monday, March 16th - Docs must be completed and reviewed
- Tuesday, March 24th - Kubernetes 1.18.0 Released
To be included in the release,
- The KEP PR must be merged
- The KEP must be in an implementable state
- The KEP must have test plans and graduation criteria.
If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍
We'll be tracking enhancements here: http://bit.ly/k8s-1-18-enhancements
Thanks! :)
Hey there @vinaykul -- 1.18 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating to alpha in 1.18?
The current release schedule is:
- Monday, January 6th - Release Cycle Begins
- Tuesday, January 28th EOD PST - Enhancements Freeze
- Thursday, March 5th, EOD PST - Code Freeze
- Monday, March 16th - Docs must be completed and reviewed
- Tuesday, March 24th - Kubernetes 1.18.0 Released
To be included in the release,
- The KEP PR must be merged
- The KEP must be in an implementable state
- The KEP must have test plans and graduation criteria.
If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍
We'll be tracking enhancements here: http://bit.ly/k8s-1-18-enhancements
Thanks! :)
@palnabarun Yes, I'm planning to work towards alpha code targets for this feature in 1.18. I've updated the KEP adding test plan and graduation criteria sections that I will be reviewing with SIG-Node this week and hope to get it implementable before Jan 28. I'll update this thread if anything changes.
Thank you @vinaykul for the updates. :)
/stage alpha
/milestone v1.18
@vinaykul Just a friendly reminder, we are just 7 days away from the Enhancement Freeze (Tuesday, January 28th).
The KEP is still in provisional state and is missing test plans and graduation criteria.
@vinaykul Just a friendly reminder, we are just 7 days away from the Enhancement Freeze (Tuesday, January 28th).
The KEP is still in
provisionalstate and is missing test plans and graduation criteria.
@palnabarun Thanks for the reminder, I have sent PR #1342 to get it them to implementable, and the change also adds test plan and graduation criteria to the KEPs. We discussed it in sig-node this morning and there were a couple of comments, which has been addressed. Once @derekwaynecarr , @dchen1107 have the chance to take a final look, we should be merging it. I'm optimistic it can be done by next Tuesday.
Awesome @vinaykul! Thanks for keeping it in priority. :)
As an added note, I am updating the issue comment here to have the KEP PR also linked.
@vinaykul Just a friendly reminder, we are just 2 days away from the Enhancement Freeze (3 PM Pacific Time, Tuesday, January 28th).
@palnabarun the KEP PR has been LGTM'd and merged. Please review and let me know if this has everything squared away now.
Amazing! Everything here looks fine wrt to the Enhancements Freeze. :)
Thank you for all efforts in getting this past the milestone.
A small nit though, the KEP linked here is stale, since it belongs to sig-node now.
Updated the issue description.
Hello @vinaykul,
I'm 1.18 docs shadow.
Just want to know if this enhancement work planned for 1.18 require any new docs (or modifications to existing docs)? If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)
If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th, it can just be a placeholder PR at this time. Warm regards,
chima
Hi @vinaykul, just a friendly reminder that the Code Freeze will go into effect on Thursday 5th March.
Can you please link all the k/k PRs or any other PRs which should be tracked for this enhancement?
Thank You :)
Hello @vinaykul,
I'm 1.18 docs shadow.
Just want to know if this enhancement work planned for 1.18 require any new docs (or modifications to existing docs)? If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)
If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th, it can just be a placeholder PR at this time.
Hello @vinaykul,
I'm 1.18 docs shadow.
Just want to know if this enhancement work planned for 1.18 require any new docs (or modifications to existing docs)? If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)
If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th, it can just be a placeholder PR at this time.
Hi @iheanyi1 yes this feature will require updates to the documentation as we have API change. I'm working on the API change code, and once it is approved / agreed-upon I'll create a PR updating the relevant docs
Thanks, Vinay
Thank you for update @vinaykul. But just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th, it can just be a placeholder PR at this time. Keep the good work, we hope you will meet deadline we are working with.
Thanks
Hi @vinaykul, just a friendly reminder that the Code Freeze will go into effect on Thursday 5th March. Please list out any PRs for this enhancement.
Hi @vinaykul.
Just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th, it can just be a placeholder PR at this time.
Thanks
@palnabarun @jeremyrickard @iheanyi1 While the code is getting there, we were a couple of hands short due to unforeseen circumstances, and some of my time was consumed by another priority for the company. So I won't be able to make 1.18 release with high quality by March 5th deadline - I don't have time to complete all the planned test-cases and iterate over the code to ensure the feature is mostly bulletproof. Since this is a feature that touches fundamental structures and core k8s components, it better be solid.
Could we please track this for 1.19 release? I'll continue to work on it and 1.19 should give us ample time for this to bake well.