Election problem when merging from partitions
Cluster [A,B,C,D,E], A is the coordinator and leader, then cluster partition into [A,B,C] and [D,E], A is still the leader, then merge back to [D,A,B,C,E], D become the coordinator, and start a voting round with term + 1, in the meantime A broadcast LeaderElectedMessage telling everyone there is a leader, D got the message and stop the voting thread without checking the result of tryAdvanceTermAndLeader, A got the VoteRequest from D, and resign from leadership and voting for the new term, then the cluster has no leader and no running election thread.
I think checking the result of tryAdvanceTermAndLeader before stop the voting thread could solve the problem, or new coordinator has a way to check if there is a existing leader?
Seems not that easy to fix. voting thread will advance the term anyway after started, if the term is advanced then the election must be done, so we can't stop the voting thread when handling the leader elected message, unless setting the leader will stop voting thread to advance the term and won't send any voting request for the next term.
Hey, @yfei-z. Thanks again! The node should only stop the voting thread if the leader and term update goes through. That is, it should ignore messages with a smaller term. I'll try to look into this week as I'll be on PTO until the 20th.
Some thoughts from the top of my head. The LeaderElected message is either delivered to [A, B, C] or [A, B, C, D, E] views. I think JGroups wouldn't deliver a message to different views. Additionally, the membership order in a view is deterministic. The members are sorted after the partition heals. This means A should be the coordinator before and after the merge.
Nevertheless, I'll try creating some tests and updating the handling of LeaderElected messages. I think the node shouldn't blindly apply the message.
Hi @jabolina, I think the order of members in the MergeView is depended on GMS.membership_change_policy, default one merge subgroups by sort the randomly generated addresses, so it could be any order.
And I think check the current term before apply the elected leader is not good enough, because the voting thread will force the term to increase with a loop, so even the term is not increased yet while setting the leader, but the voting thread could still increase the term before it actually stopped.
ELECTION2 seems no problem in this case. Although B and C might ignore the PreVoteRequest at first (NAKACK2), but after the retransmission, D could have all PreVoteResponses and decide to start the voting thread, it happens after the LeaderElectedMessage of A.