terraform-provider-k0s
terraform-provider-k0s copied to clipboard
Node removal not supported, however it "works" in unexpected manner
Some time ago, k0sctl added support for node removal.
This provider calls the necessary phase to reset controllers, but it doesn't prepare hosts list, so they can be removed. Data structure ClusterResourceModelHost misses Reset field, and there's no logic that would translate host removal from state to flag update, so it can be picked up by phase manager.
It's quite problematic, when after removal, a new host is added with the same IP, as this is the unique ID for many k0s structures - it results in split-brain. The cluster still tries to connect to a new VM using IP that was not removed (mainly from etcd) and the new VM is stuck on cluster init phase, but serves requests immediately. Control-plane HA requires a load-balancer, so without sophisticated checks it can easily serve two clusters at the same time.
As per docs, the workaround seems to be to manually execute k0s etcd leave --peer-address IP_ADDR on an alive node - in most cases the node we want to delete, but it gets tricky if we're rebuilding a crashed VM. More so, since destroy time provisioners in TF only work with clean destroy - not even with taint.
This is not yet supported in k0sctl itself - https://github.com/k0sproject/k0sctl/issues/603