featurebase What happens when the Cluster Coordinator fails?

Hello,

What happens when the cluster coordinator node completely dies?

Can you replace it with a new cluster coordinator node?
What happens to the existing nodes which have gossip seeds pointing to old/dead cluster coordinator node?
Is it possible to set up automatic leader election to replace the cluster coordinator instead of doing it "manually"?

Dec 16 '20 17:12 cozos

Hi @cozos, good questions.

If the coordinator dies and cannot be recovered, you will need to do the following:

assign another, healthy node to be coordinator (see Changing the Coordinator)
remove the dead node from the cluster (see Removing a Node)
(optionally) add a new node to the cluster to return the cluster to its original size

Note that these steps require that you have a replication factor of at least 2.

The gossip seed pointing to the old cluster won't be a problem as that is only used during startup. If you restart a node, you'll want to ensure that its seeds configuration contains at least one node which is still available. A good practice in general is to provide more than one node in seeds.

We are working on implementing automatic leader election, but I don't yet have information on when that would be available.

Dec 18 '20 04:12 travisturner

Thanks @travisturner for the in depth response. I realize now that a lot of these questions are answered in TFA.

Out of curiousity, is there a particular reason automatic leader election is not a thing? Or is it something you guys haven't gotten around to yet?

One last question: is automatic dead node detection (i.e. detect a node is unavailable for a while and rebalance) on the roadmap? If not, any particular reason?

Thanks and feel free to close the issue if you want.

Dec 20 '20 04:12 cozos