piko icon indicating copy to clipboard operation
piko copied to clipboard

Cluster netsplit recovery

Open andydunstall opened this issue 9 months ago • 0 comments

Say you have a cluster with 6 nodes, then a network partition means one half of the cluster can't talk to the other half

Currently Piko will end up with 2 smaller clusters, where each considers the other as unreachable or no longer part of the cluster.

To ensure the cluster recovers when the netsplit recovers, each node should periodically attempt to gossip with any unknown nodes. Such as when service discovery is configured using DNS (such as a headless service on K8S), the nodes can re-resolve the domain and check if there are any nodes that they don't consider part of the cluster and attempt to contact those nodes.

andydunstall avatar May 17 '24 07:05 andydunstall