etcd-cluster-operator icon indicating copy to clipboard operation
etcd-cluster-operator copied to clipboard

Removing a leader member during scale-down causes cluster downtime

Open wallrj opened this issue 5 years ago • 0 comments

In #93 remove the etcd member whose name contains the largest ordinal, but this member may well be the cluster leader. This forces a leader election which prevents write requests https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#why-does-etcd-lose-its-leader-from-disk-latency-spikes

This is compounded if we are removing multiple members and the next new leader also happens to have the next largest ordinal.

Instead, if we removed only non-leader members, we might avoid these disruptions.

wallrj avatar Nov 26 '19 17:11 wallrj