enhancements KEP-4471: kubeadm: make a control-plane's kubelet talk to the local API Server on kubeadm join.

One-line PR description: adding new KEP

Issue link: #4471

Other comments:

Feb 08 '24 09:02 chrischdi

cc @neolit123 @fabriziopandini @sbueringer :-) PTAL

Feb 08 '24 09:02 chrischdi

FYI: I'm on PTO the next days and will be back on 15th. Will start addressing feedback then :-)

Feb 08 '24 18:02 chrischdi

Sounds reasonable overall, but Fabrizio and Lubomir are way more familiar with the kubeadm details

Feb 09 '24 10:02 sbueringer

Pointing the control-plane's kubelet to the local kube-apiserver may cause some stability issues. When kube-apiserver fails, the local kubelet will crash, and the node will become NotReady, the entire control-plane node cannot handle the new Pod. However, using LB will not cause the entire control-plane node to be NotReady, just some components can not work well. The second point is that upgrading the kubelet binary version seems to be user control, and kubeadm only modifies the kubelet configurations and does not immediately update the binary version of kubelet. https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-kubelet-and-kubectl

When kubeadm join is executed, version skew does not occur, and that only occurs during the upgrade phase.

Feb 25 '24 03:02 SataQiu

Pointing the control-plane's kubelet to the local kube-apiserver may cause some stability issues. When kube-apiserver fails, the local kubelet will crash, and the node will become NotReady, the entire control-plane node cannot handle the new Pod.

I agree, I will add that information. One thought: in case of using kubeadm, kube-apiserver is run as static pod anyway, so shouldn't the kubelet be able to bring it up again?

An improvement may be to only use the local endpoint during TLS Bootstrap and make kubeadm change the config afterwards back to the load balancer 🤔

However, using LB will not cause the entire control-plane node to be NotReady, just some components can not work well. The second point is that upgrading the kubelet binary version seems to be user control, and kubeadm only modifies the kubelet configurations and does not immediately update the binary version of kubelet. kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-kubelet-and-kubectl

When kubeadm join is executed, version skew does not occur, and that only occurs during the upgrade phase.

In case of immutable upgrades (e.g. Cluster API) where you update the whole image and the image contains the binary versions: when correctly following the version skew you would have to (example: v1.x -> v1.y):

have one image A which has v1.y kubeadm + v1.x kubelet
have one image B which has v1.y kubeadm + v1.y kubelet

And upgrade via:

rolling update control-plane nodes to image A
rolling update control-plane nodes to image B

Everyone nowadays skips step 1 with CAPI (because normally it works).

Feb 26 '24 07:02 chrischdi

Pointing the control-plane's kubelet to the local kube-apiserver may cause some stability issues. When kube-apiserver fails, the local kubelet will crash, and the node will become NotReady, the entire control-plane node cannot handle the new Pod.

I agree, I will add that information. One thought: in case of using kubeadm, kube-apiserver is run as static pod anyway, so shouldn't the kubelet be able to bring it up again?

this is something that i mentioned/asked on the kubeadm office hours zoom call we had. i think the new way is less HA, and less resilient to apiserver component faliures.

currently with a kubelet pointing at the LB if a local apiserver fails (e.g. due to an internal apiserver bug) the kubelet will try to find a healthy kube-apiserver and the Node will remain healthy, while in parallel it will try to restart the local filing pod. with the new way if the apiserver fails the node will become not ready.

An improvement may be to only use the local endpoint during TLS Bootstrap and make kubeadm change the config afterwards back to the load balancer 🤔

bootstrap to local apiserver makes sense. but if it then points to the LB it can still violate the skew policy, which according to the prior discussions from api-machinery can cause problems. we haven't seen such problems yet, mainly because of k8s kubelet / apiserver stability.

i was thinking that maybe we can let the user control this behavior with an option in the config API. i personally don't like this idea, but it can be added in v1beta4 if people vote +1 for that.

alternatively, the default behavior can be - point to local, but they can manually edit to point to LB if they want more HA.

Feb 26 '24 08:02 neolit123

@SataQiu @pacoxu any more comments here?

i was thinking that maybe we can let the user control this behavior with an option in the config API. i personally don't like this idea, but it can be added in v1beta4 if people vote +1 for that.

still not a fan of making it configurable as pointing to the LB can violate kubelet / apiserver skew, but to make it configurable by kubeadm we can do one of these things:

add a new patchtarget kubeletkubeconfig. problem with this is that the user will likely only use this patch target for changing server and it will be a rare use case.
add clusterconfiguration.pointKubeletAtControlPlaneEndpoint false by default, still a rare use case.
document somewhere (maybe in the kubeadm HA docs at k/website) that users can edit a node kubelet.conf kubeconfig file to point to LB and kubeadm or kubelet will not modify the contents of this file for e.g. cert rotation. i.e. the server value will persist.

my vote goes to 3. WDYT?

EDIT: and yes we should still have a FG while the feature graduates.

Apr 29 '24 06:04 neolit123

also vote to 3 When designing the kubeadm upgrade apply phases, we should support skipping the kubelet upgrade phase. Then users can upgrade the kubelet version after ensuring that all the control-plane apiserver instances have been upgraded.

Just like kubeadm upgrade node --skip-phases kubelet-config

FYI: It seems that kubeadm just configures kubelet, but not restart the kubelet service. We can state in the documentation that please restart the kubelet service in turn after all the control-plane instances have been upgraded.

Apr 29 '24 10:04 SataQiu

also vote to 3

👍

When designing the kubeadm upgrade apply phases, we should support skipping the kubelet upgrade phase. Then users can upgrade the kubelet version after ensuring that all the control-plane apiserver instances have been upgraded.

upgrade does not touch "etc/kubernetes/kubelet.conf" today, so if the user modifies the "server" in there it will persist.

FYI: It seems that kubeadm just configures kubelet, but not restart the kubelet service. We can state in the documentation that please restart the kubelet service in turn after all the control-plane instances have been upgraded.

yes, a kubelet restart will be required. they have to do it one time after changing the value of "server"

Apr 29 '24 11:04 neolit123

3. document somewhere (maybe in the kubeadm HA docs at k/website) that users can edit a node kubelet.conf kubeconfig file to point to LB and kubeadm or kubeadm will not modify the contents of this file for e.g. cert rotation. i.e. the server value will persist.

IIUC, kubelet.conf change is a behavior change or a bugfix.

Need we notice cluster-api or other kubeadm adapters about this?

vote to 2 and 3. 3 may be the right direction, but 2 is also a good choice to me.

Apr 30 '24 08:04 pacoxu

document somewhere (maybe in the kubeadm HA docs at k/website) that users can edit a node kubelet.conf kubeconfig file to point to LB and kubeadm or kubeadm will not modify the contents of this file for e.g. cert rotation. i.e. the server value will persist.

IIUC, kubelet.conf change is a behavior change or a bugfix.

we can say it's a bug fix for potential skew problems that introduces a change in behavior.

Need we notice cluster-api or other kubeadm adapters about this?

vote to 2 and 3. 3 may be the right direction, but 2 is also a good choice to me.

we could just document it as our first choice (3), but if users really need better flexibility, we can think about alternatives like 2 or something else.

Apr 30 '24 09:04 neolit123

Need we notice cluster-api or other kubeadm adapters about this? vote to 2 and 3. 3 may be the right direction, but 2 is also a good choice to me. we could just document it as our first choice (3), but if users really need better flexibility, we can think about alternatives like 2 or something else.

3 is ok to go. +1 2 can be a later choice if many users really want that.

Apr 30 '24 10:04 pacoxu

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-cluster-lifecycle/OWNERS~~ [neolit123]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

May 07 '24 14:05 k8s-ci-robot

/lgtm

May 07 '24 16:05 neolit123

enhancements enhancements copied to clipboard

KEP-4471: kubeadm: make a control-plane's kubelet talk to the local API Server on kubeadm join.

enhancements
enhancements copied to clipboard