k0s icon indicating copy to clipboard operation
k0s copied to clipboard

Manage Etcd peers using custom resources

Open jnummelin opened this issue 1 year ago • 1 comments

One of the challenges for automation (Terraform, cluster API, k0sctl) when removing nodes is that when removing a controller node one has to also remove the Etcd peer entry to it in Etcd cluster.

In this PR there's few things going on:

  • Each controller creates their own EtcdMember object with memberID and peer address fields
  • Each controller watches the changes on EtcdMember objects
  • When the leader controller sees EtcdMember object being deleted it will remove the corresponding peer from Etcd cluster.
  • If the peer removal fails for some reason it will write an event into kube-system namespace

Very DRAFT still, need to refactor some of the code for better readability and error handling.

Description

Fixes # (issue)

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Documentation update

How Has This Been Tested?

  • [ ] Manual test
  • [ ] Auto test added

Checklist:

  • [ ] My code follows the style guidelines of this project
  • [ ] My commit messages are signed-off
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] Any dependent changes have been merged and published in downstream modules
  • [ ] I have checked my code and corrected any misspellings

jnummelin avatar Mar 28 '24 09:03 jnummelin

This pull request has merge conflicts that need to be resolved.

github-actions[bot] avatar Mar 28 '24 09:03 github-actions[bot]

This pull request has merge conflicts that need to be resolved.

github-actions[bot] avatar Apr 17 '24 14:04 github-actions[bot]

two questions:

  1. Will "left over" etcd members (e.g. from past-controller changes, happened to me once after changing node IPs) be cleaned up?
  2. How would controller removal look like overall? especially when the node died without cleaning up after itself.

pschichtel avatar Apr 19 '24 13:04 pschichtel

Will "left over" etcd members (e.g. from past-controller changes, happened to me once after changing node IPs) be cleaned up?

Not sure I get what you mean by "left over" member here? Do you mean you had a stale member in etcd cluster, i.e. a node with the old IP / hostname? If yes, then those can be also cleaned up by marking the node to leave the etcd cluster.

How would controller removal look like overall? especially when the node died without cleaning up after itself.

Using EtcdMemberobject you'd remove a node by running something like:

kubectl patch etcdmember controller2 -p '{"leave":true}' --type merge

That triggers the current leader controller to remove the given member from etcd. Of course now if the node died without cleaning up youäd still have leftover data etc on the node.

jnummelin avatar Apr 22 '24 08:04 jnummelin

Not sure I get what you mean by "left over" member here? Do you mean you had a stale member in etcd cluster, i.e. a node with the old IP / hostname?

yes exactly. so the "stale" members would also just show up as EtcdMember objects and I can then just delete them, is that right?

Regarding the controller removal: I was more interested in whether there would be any automation like e.g. a finalizer on Node objects, that makes sure that corresponding etcd member is deleted as well. So would "delete the etcdmember object" be an additional step in the process of removing a controller node or would it happen automatically under normal circumstances.

pschichtel avatar Apr 22 '24 10:04 pschichtel

automation like e.g. a finalizer on Node objects

We cannot use Node object for this purpose, for plain controllers there's no Node object in k0s.

So would "delete the etcdmember object" be an additional step in the process of removing a controller node or would it happen automatically under normal circumstances.

Once the member has left etcd cluster and the object status reflects that, you can safely remove the object. k0s will NOT do that automatically as we wanted to have some visibility on the actions already taken. My initial idea was actually to rely on object deletion but that does not keep any visible history what has happened.

Also in case you remove say controller5 member and later re-join that same node (same hostname) k0s will be able to re-join it in etcd and the object status will be updated.

jnummelin avatar Apr 23 '24 08:04 jnummelin

I converted this to draft accidentally and made it ready to review again, disregard my updates

juanluisvaladas avatar May 02 '24 12:05 juanluisvaladas

Currently, an EtcdMember custom resource looks like the following:

apiVersion: etcd.k0sproject.io/v1beta1
kind: EtcdMember
memberID: 21e943ed8772de19
metadata:
  creationTimestamp: "2024-05-03T11:32:11Z"
  generation: 1
  name: etcdmember-controller-2
  resourceVersion: "362"
  uid: 641f2594-881a-4e8f-939e-94535d6b5904
peerAddress: 172.27.200.93
status:
  conditions:
  - status: "True"
    type: Joined

Wouldn't it be better to not have something else than apiVersion/kind/metadata/spec/status as top-level fields?

So what about this as an alternative:

--- etcdmember-controller-2
+++ etcdmember-controller-2
@@ -1,14 +1,16 @@
 apiVersion: etcd.k0sproject.io/v1beta1
 kind: EtcdMember
-memberID: 21e943ed8772de19
 metadata:
   creationTimestamp: "2024-05-03T11:32:11Z"
   generation: 1
   name: etcdmember-controller-2
   resourceVersion: "362"
   uid: 641f2594-881a-4e8f-939e-94535d6b5904
-peerAddress: 172.27.200.93
+spec:
+  leave: false
 status:
   conditions:
   - status: "True"
     type: Joined
+  memberID: 21e943ed8772de19
+  peerAddress: 172.27.200.93

Moreover, the lastTransitionTime is not present on the condition. There's also lastProbeTime on stock Kubernetes conditions.

twz123 avatar May 03 '24 11:05 twz123