etcd-cluster-operator
etcd-cluster-operator copied to clipboard
Removing a leader member during scale-down causes cluster downtime
In #93 remove the etcd member whose name contains the largest ordinal, but this member may well be the cluster leader. This forces a leader election which prevents write requests https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#why-does-etcd-lose-its-leader-from-disk-latency-spikes
This is compounded if we are removing multiple members and the next new leader also happens to have the next largest ordinal.
Instead, if we removed only non-leader members, we might avoid these disruptions.