SlimCluster icon indicating copy to clipboard operation
SlimCluster copied to clipboard

Split-brain

Open Ephron-WL opened this issue 1 year ago • 1 comments

I noticed in a 5 node cluster that when the cluster was reduced to 2 nodes, the 2 nodes continue to operate normally. This was no doubt also related to my bug fix to ensure the cluster continued to operate when it was degraded, whether or not a quorum could be established. I'm not sure whether this is a problem or not. A fundamental principle of quorum-based clusters is that if you don't have quorum you probably need to shutdown state changes because the assumption is the remaining set of nodes probably do have the quorum and should be authorative. Would it be reasonable to stop serving API calls if the member count falls below a quorum?

Ephron-WL avatar Jan 16 '25 01:01 Ephron-WL

Yes, I agree. The cluster should go into some error/undefined state. That state should then inform the user code to either stop serving requests or continue - ideally this could be a setting to let users decide depending on their use cases. Sometimes it would be okay to serve request (read only) with the caveat that the reduced cluster state might be stale (split brain). For other use cases the users reads if stalled could not be acceptable. In either case the cluster write/mutating state if cluster is in error/undefined state could fail all these requests (there is no quorum).

Also pleased see my comment on your PR #28 as I think one change is related and perhaps not desired.

zarusz avatar Jan 16 '25 22:01 zarusz