kubernetes [FG:InPlacePodVerticalScaling] Tracking TODO items to address pre-beta, at beta, GA, and GA+1

What would you like to be added?

This enhancement tracks various TODO items from alpha to GA for the In-Place Pod Vertical Scaling feature. To find pending TODO items in the k/k repo, do:

git grep TODO | grep InPlacePodVerticalScaling

In pkg/kubelet/kubelet_pods.go, update PodStatus.Resources to include extended resources. Target: <Beta.
In pkg/kubelet/kubelet.go, investigate calling kl.handlePodResourcesResize in HandlePodUpdates + periodic SyncLoop
In pkg/kubelet/kubelet.go, can we recover from SetPodAllocation/SetPodResizeStatus checkpointing failure if it were to occur? Target: < Beta.
In pkg/kubelet/kuberuntime/helpers_linux.go, address issue that sets min req/limit to 2m/10m. Target: <Beta.
In pkg/kubelet/kuberuntime/kuberuntime_manager.go, figure out enforceMemoryQoS usage in platform agnostic way. Target: <Beta.
In pkg/kubelet/cri/remote/remote_runtime.go, remove v1alpha2 support for Windows if confirmed as unnecessary. Target: <Beta.
In test/e2e/node/pod_resize.go, remove featureGatePostAlpha var. Target: Beta.
In pkg/apis/core/validation/validation.go, remove updatablePodSpecFieldsNoResources variable. Target: GA.
In pkg/apis/core/validation/validation.go, investigate if PodStatus.QOSClass can replace qos.GetPodQOS(). Target: GA.
In pkg/kubelet/container/helpers.go, remove HashContainerWithoutResources() and associated code. Target: GA+1.

TODOs not tracked in code:

Add and expose a helper function to get a pod's resource requirements and allocations for use by metrics, kubectl describe, etc

Why is this needed?

These were TODO items found during review of PR https://github.com/kubernetes/kubernetes/pull/102884/ and it was agreed they should not block alpha. Most need to be handled before Beta, and a few need to be addressed at GA/GA+1.

Apr 19 '22 18:04 vinaykul

/sig node

Apr 19 '22 18:04 vinaykul

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 18 '22 18:07 k8s-triage-robot

/remove-lifecycle stale

Jul 30 '22 12:07 vinaykul

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 05 '22 15:11 k8s-triage-robot

/remove-lifecycle stale

Nov 11 '22 02:11 vinaykul

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 01 '23 14:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 31 '23 15:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jun 30 '23 15:06 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jun 30 '23 15:06 k8s-ci-robot

/reopen

Jul 29 '23 13:07 vinaykul

@vinaykul: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jul 29 '23 13:07 k8s-ci-robot

/remove-lifecycle rotten

Jul 29 '23 13:07 vinaykul

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 25 '24 10:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 24 '24 11:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 25 '24 11:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 25 '24 11:03 k8s-ci-robot

/reopen

Mar 28 '24 10:03 esotsal

@esotsal: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 28 '24 10:03 k8s-ci-robot

/reopen

Mar 28 '24 10:03 Karthik-K-N

@Karthik-K-N: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 28 '24 10:03 k8s-ci-robot

/assign

Mar 28 '24 10:03 esotsal

/remove-lifecycle rotten

Apr 07 '24 01:04 esotsal

/triage accepted

Updated list of TODOs:

[ ] pkg/apis/core/validation/validation.go:5071 - Drop this var once InPlacePodVerticalScaling goes GA and featuregate is gone.
[ ] pkg/kubelet/cm/cgroup_manager_linux.go:645 - Add memory request support
[ ] pkg/kubelet/cm/cgroup_manager_linux.go:731 - Add memory request support
[ ] pkg/kubelet/container/helpers.go:128 - Remove this in GA+1 and make HashContainerWithoutResources to become Hash.
[ ] pkg/kubelet/container/runtime.go:301 - Remove this in GA+1 and make HashWithoutResources to become Hash.
[ ] pkg/kubelet/kubelet.go:1962 - Investigate doing this in HandlePodUpdates + periodic SyncLoop scan
[ ] pkg/kubelet/kubelet.go:2588 - Can we recover from this in some way? Investigate
[ ] pkg/kubelet/kubelet.go:2847 - Can we recover from this in some way? Investigate
[ ] pkg/kubelet/kubelet.go:2855 - Can we recover from this in some way? Investigate
[ ] pkg/kubelet/kubelet_pods.go:2107 - Update this to include extended resources in
[ ] pkg/kubelet/kuberuntime/helpers_linux.go:63 - Address issue that sets min req/limit to 2m/10m before beta
[ ] pkg/kubelet/kuberuntime/kuberuntime_container_linux_test.go:867 - Add unit tests for cgroup v1 & v2
[ ] pkg/kubelet/kuberuntime/kuberuntime_manager.go:662 - Figure out best way to get enforceMemoryQoS value (parameter #4 below) in platform-agnostic way
[ ] pkg/scheduler/internal/queue/scheduling_queue.go:1074 - Fix this to determine when a
[ ] test/e2e/node/pod_resize.go:85 - Can we optimize this?
[ ] test/e2e/node/pod_resize.go:334 - Is there a better way to determine this?
[ ] test/e2e/node/pod_resize.go:500 - Remove this check once base-OS updates to containerd>=1.6.9

@esotsal would you mind updating the issue description to merge it with this list?

May 03 '24 18:05 tallclair

@tallclair @esotsal I've converted it to task list, please review. For the checkpoint failure TODOs, I want to toss out the node-local checkpointing code entirely and rely on podStatus as source of truth. (Please see https://github.com/kubernetes/kubernetes/pull/119665) @ndixita Does it still make sense to support setting memory request?

May 05 '24 22:05 vinaykul

[ ] pkg/kubelet/container/runtime.go:301 - Remove this in GA+1 and make HashWithoutResources to become Hash.

[ ] pkg/kubelet/kubelet.go:1962 - Investigate doing this in HandlePodUpdates + periodic SyncLoop scan

With the merge of https://github.com/kubernetes/kubernetes/pull/124220 I think the both issues have been addressed.

May 25 '24 12:05 HirazawaUi

@esotsal would you mind updating the issue description to merge it with this list?

Unfortunately @tallclair I am not allowed to modify issue description :-( , perhaps better to assign @vinaykul for this or close this issue and use https://github.com/orgs/kubernetes/projects/178 instead to track TODOs ?

May 25 '24 13:05 esotsal

@esotsal would you mind updating the issue description to merge it with this list?

Unfortunately @tallclair I am not allowed to modify issue description :-( , perhaps better to assign @vinaykul for this or close this issue and use https://github.com/orgs/kubernetes/projects/178 instead to track TODOs ?

I've updated it. I don't suppose there is a way to transfer ownership of issue, is there? (I can periodically keep it updated as bandwidth permits, but let's use the project board for most current info)

May 26 '24 18:05 vinaykul

test/e2e/node/pod_resize.go:334 - Is there a better way to determine this? => Please check here proposal test/e2e/node/pod_resize.go:85 - Can we optimize this? => please check here proposal test/e2e/node/pod_resize.go:500 - Remove this check once base-OS updates to containerd>=1.6.9 ==> please check here proposal

With the merge of #124296 we can update test/e2e accordingly, and close those three TODOs i believe

Jun 13 '24 08:06 esotsal

Is there a plan for https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#downwardapi-resourceFieldRef ? I guess this would work only for files?

Jul 16 '24 12:07 iksaif

/unassign

Jul 24 '24 04:07 esotsal