NuRaft
NuRaft copied to clipboard
The safety of dynamically adjustable Custom Quorum
Basically, it's easy to understand that the flexible quorum is safe if it's a static config.
But is the Leader Completeness
guaranteed when Qc
and Qe
are dynamically adjustable?
For example, I have a 5 nodes cluster with Qc(3)
and Qe(3)
(the default algorithm) and then change the config to Qc(5)
and Qe(1)
. How can the cluster always elect a valid leader as there are potential 2 nodes with incomplete raft logs but a leader can be established by its own vote?
Hi @Fullstop000 , we don't think dynamic change of quorum size is safe, although we haven't proved it rigorously. In eBay use cases, quorum size basically remains unchanged, and we adjust it only for the manual recovery of fatal case.
If you want to change it on-the-fly, how about increase one of it (Qc(3)->Qc(5)
) first, wait for at least one commit, and then decrease the other one (Qe(3)->Qe(1)
) next? Still I'm not sure whether it is safe. @genezhang Please let me know your thoughts.
Thanks.
@greensky00 Thank you for the reply. The way you provide should be safe as the Leader Completeness is guaranteed in election after the entry is committed under Qc(5)
. And it seems only when decreasing Qe
might cause some unsafe concerns.
I don't think dynamic change of quorum size is safe, it's the same thing as membership changes, there is a way to make it right, but the easiest way is still add/remove one at a time. manually change could be used to recover a bad cluster, but, yes, more diagnostic tools need to be added to support real production issues, such as a log entry may crash all nodes when state machine is trying to apply the log entry, then, we should have a tool to remove that log entry from log store and restart the nodes, etc.