cluster-api Cluster API Informing kube-reserved and system-reserved

/area api /area kubelet /area bootstrap

User Story

As a cluster operator, as I begin to load my cluster with workloads, without system-reserved or kubelet-reserved being set, my cluster can start exhibiting strange behaviour as various processes start being OOMkilled. I would like Cluster API to provide a mechanism to set these values based on my infrastructure.

Detailed Description

Production clusters should have system-reserved and kubelet-reserved set for the correct functioning of the cluster. Some deployers make opinionated guesses for what these should be based on the memory on the system. For example, EKS makes a calculation based on the instance type.

KEP 2369 introduced the possibility of dynamic sizing providers for the Kubelet, and CAPI could potentially act as a control point for its own Kubelet sizing provider, e.g. have something in the MachineSpec, and also have reconciliation mechanisms to copy that information from, but the proposal was not pursued. There is also likely to be Cluster Autoscaler impacts, so @elmiko might be interested too:

spec:
  reservations:
     kubelet: 2Gi
     system: 3Gi

The alternative way is to plug something into kubeadmConfig, and have kubeadm set the values on the kubelet startup flags. Not a fan of this as it's not dynamic, and also starts to even more closely couple bootstrap configuration to the machine configuration, which is something we want to decouple (#5294).

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind feature

Oct 05 '21 22:10 randomvariable

@randomvariable: The label(s) area/kubelet cannot be applied, because the repository doesn't have them.

In response to this:

/area api /area kubelet /area bootstrap

User Story

As a cluster operator, as I begin to load my cluster with workloads, without system-reserved or kubelet-reserved being set, my cluster can start exhibiting strange behaviour as various processes start being OOMkilled. I would like Cluster API to provide a mechanism to set these values based on my infrastructure.

Detailed Description

Production clusters should have system-reserved and kubelet-reserved set for the correct functioning of the cluster. Some deployers make opinionated guesses for what these should be based on the memory on the system. For example, EKS makes a calculation based on the instance type.

KEP 2369 introduces the possibility of dynamic sizing providers for the Kubelet, and CAPI could potentially act as a control point for its own Kubelet sizing provider, e.g. have something in the MachineSpec, and also have reconciliation mechanisms to copy that information from . There is also likely to be Cluster Autoscaler impacts, so @elmiko might be interested too:
spec:
 reservations:
    kubelet: 2Gi
    system: 3Gi
The alternative way is to plug something into kubeadmConfig, and have kubeadm set the values on the kubelet startup flags. Not a fan of this as it's not dynamic, and also starts to even more closely couple bootstrap configuration to the machine configuration, which is something we want to decouple (#5294).

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 05 '21 22:10 k8s-ci-robot

cc @perithompson @jayunit100

Oct 05 '21 23:10 randomvariable

@randomvariable I assume dynamic KubeletConfiguration is not an option because of: https://github.com/kubernetes-sigs/cluster-api/issues/4464#issuecomment-818813951?

Oct 06 '21 07:10 sbueringer

Dynamic kubelet configuration is definitely deprecated if not completely deleted by now. There's a reference to that in the supporting KEP.

Oct 06 '21 08:10 randomvariable

Dynamic kubelet configuration is definitely deprecated if not completely deleted by now. There's a reference to that in the supporting KEP.

Some more context in: https://github.com/kubernetes/enhancements/issues/281

I wasn't aware of the deprecation. Looks like it will be dropped with 1.23. https://github.com/kubernetes/kubernetes/pull/102966/files:

fs.MarkDeprecated("dynamic-config-dir", "Feature DynamicKubeletConfig is deprecated in 1.22 and will not move to GA. It is planned to be removed from Kubernetes in the version 1.23. Please use alternative ways to update kubelet configuration.")

Oct 06 '21 08:10 sbueringer

DKC may be dropped in .23. It's actually difficult to get rid of it due to test binding. It should not be used. Without DKC there is no true dynamic reconfigiration mechanism and it requires kubelet restarts. Depending on rollout this can render a cluster without running kubelets for a period of time.

In terms of setting this only at bootstrap CAPI could start accepting the kubelet config as a blob, but kubeadm join doesn't support it even yet. My idea there was to allow joining nodes to patch their kubelet config. But this means CAPI needs support for kubeadm patches.

Oct 06 '21 12:10 neolit123

/milestone Next /kind proposal

Oct 22 '21 17:10 vincepri

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 20 '22 19:01 k8s-triage-robot

/remove-lifecycle stale

Jan 20 '22 19:01 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 20 '22 20:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 20 '22 21:05 k8s-triage-robot

/lifecycle frozen

May 22 '22 09:05 fabriziopandini

/help

In order to set those values benchmark data are required; also sizing depends on number of targets objects to manage (clusters/machines) and they can be impacted by performances of the hardware used for the management cluster

Oct 03 '22 19:10 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

In order to set those values benchmark data are required; also sizing depends on number of targets objects to manage (clusters/machines) and they can be impacted by performances of the hardware used for the management cluster

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 03 '22 19:10 k8s-ci-robot

/triage accepted

Nov 30 '22 17:11 fabriziopandini

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Jan 19 '24 12:01 k8s-triage-robot

/priority important-longterm

Apr 12 '24 14:04 fabriziopandini

The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.

The issue is not active since 2021, What we did to mitigate is that we documented how to configure kubelet https://cluster-api.sigs.k8s.io/tasks/bootstrap/kubeadm-bootstrap/kubelet-config.html?highlight=kubelet#kubelet-configuration

Apr 22 '24 13:04 fabriziopandini

/close

Apr 22 '24 13:04 fabriziopandini

@fabriziopandini: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 22 '24 13:04 k8s-ci-robot