piko
piko copied to clipboard
Cluster netsplit recovery
Say you have a cluster with 6 nodes, then a network partition means one half of the cluster can't talk to the other half
Currently Piko will end up with 2 smaller clusters, where each considers the other as unreachable or no longer part of the cluster.
To ensure the cluster recovers when the netsplit recovers, each node should periodically attempt to gossip with any unknown nodes. Such as when service discovery is configured using DNS (such as a headless service on K8S), the nodes can re-resolve the domain and check if there are any nodes that they don't consider part of the cluster and attempt to contact those nodes.