cluster-api
cluster-api copied to clipboard
Ensuring Control Plane Labels/Taints Persist Through Node Replacement
What would you like to be added (User Story)?
CAPI should ensure control plane labels and taints are reapplied to replacement nodes after node deletion
Detailed Description
Cluster control plane nodes have a set of labels and taint which are applied by kubeadm on node creation.
If afterwards a control plane node is deleted for any reasons and then recreated e.g. by restarting kubelet, it is necessary to ensure that those labels and taints are applied to the replacement node.
Anything else you would like to add?
No response
Label(s) to be applied
/kind feature One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
I'm wondering if we are fixing the issue in the right layer.
Wouldn't this issue happen also if you are using plain kubeadm without CAPI? if yes, probably the fix should be discussed in kubeadm first
i don't know what is the right fix here, but if kubeadm reset is called and if a node is then deleted, the labels and taints would presumably persist in a config file on disk. so if kubeadm join is called with the config, a new node will be created with the old labels and taints.
We also have to figure out how this fits into the label & taint propagation story
We need a deeper discussion about this.
There are also bootstrap error that can lead to similar problems, and it this case we should probably not try to fix issues from CAPI/KCP. But also fixing users errors after bootstrap is tricky, might be remediation is a better strategy...
/triage accepted /priority important-longterm
/help To drive the initial research work (too early for implementing something at this stage)
@fabriziopandini: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
We need a deeper discussion about this.
There are also bootstrap error that can lead to similar problems, and it this case we should probably not try to fix issues from CAPI/KCP. But also fixing users errors after bootstrap is tricky, might be remediation is a better strategy...
/triage accepted /priority important-longterm
/help To drive the initial research work (too early for implementing something at this stage)
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.