jgroups-raft icon indicating copy to clipboard operation
jgroups-raft copied to clipboard

Leader left before the coordinator apply the election result

Open yfei-z opened this issue 1 year ago • 5 comments

Regarding https://github.com/jgroups-extras/jgroups-raft/issues/259, my test case still fails in 1.0.13.

	@Test
	void bug2(@TempDir(cleanup = CleanupMode.ALWAYS) Path tmp) throws Exception {
		String clusterName = "node-cluster", dir = tmp.toString();

		List<RaftNode> nodes = raftChannels("A,B,C", dir).stream().map(RaftNode::new).toList();
		try (var a = nodes.get(0); var b = nodes.get(1); var c = nodes.get(2)) {
			a.getCh().connect(clusterName); a.untilCoord(3);
			b.getCh().connect(clusterName); assertEquals("A", b.untilLeader(3));
			c.getCh().connect(clusterName); assertEquals("A", c.untilLeader(3));

			a.setAsync("cmd1".getBytes()).join();
			c.getCh().disconnect();
			a.setAsync("cmd2".getBytes()).join();

			assertEquals(2, a.lastAppended());
			assertEquals(2, b.lastAppended());
			assertEquals(1, c.lastAppended());
		}

		nodes = raftChannels("A,B,C", dir, t -> configProtocol(t, "raft.ELECTION", ELECTION.class))
				.stream().map(RaftNode::new).toList();
		try (var a = nodes.get(0); var b = nodes.get(1); var c = nodes.get(2)) {

			ELECTION election = c.getCh().stack().findProtocol(ELECTION.class);
			election.getSendLeader().blockInterruptibly(); // block the election thread before apply the elected leader

			c.getCh().connect(clusterName); c.untilCoord(3); // C become the coordinator
			b.getCh().connect(clusterName); // C: existing view: [C|0] (1) [C], new view: [C|1] (2) [C, B], result: reached
			a.getCh().connect(clusterName); // C: existing view: [C|1] (2) [C, B], new view: [C|2] (3) [C, B, A], result: no_change
			c.untilMembers(3, 3); assertEquals(List.of("C", "B", "A"), c.members()); // all up

			election.getSendLeader().untilWaiters(1, 3); // election thread has being blocked
			b.getCh().disconnect(); // C: existing view: [C|2] (3) [C, B, A], new view: [C|3] (2) [C, A], result: no_change

			c.untilMembers(2, 3); // view has being updated in C
			assertEquals(List.of("C", "A"), c.members());
			election.getSendLeader().unblock(); // unblock the election thread

			for (var t : List.of(c, a)) {
				assertEquals("B", t.untilLeader(3)); // TODO leader is a disconnected node
				assertThrows(RaftLeaderException.class, () -> t.setAsync("cmd3".getBytes())).printStackTrace();
			}
		}
	}

I found the fixing code where under reached and leader_lost cases which stop the existing voting thread before start a new one, but in my test case it's no_change.

yfei-z avatar Jun 12 '24 09:06 yfei-z

The test https://github.com/jgroups-extras/jgroups-raft/blob/6f3a9ee37fd30f31527c89a7ecc0f50240ad8061/tests/junit-functional/org/jgroups/tests/election/ViewChangeElectionTest.java#L55 was supposed to cover this scenario. I'll take a look today and get back. Thanks for reporting!

jabolina avatar Jun 12 '24 12:06 jabolina

We have different blocking points, I block the election thread just before raft.setLeaderAndTerm(leader, new_term);, and you block the message sending where after the leader has been applied locally.

yfei-z avatar Jun 13 '24 04:06 yfei-z

I've looked into it and was trying to find a solution. The fact the block is the setLeaderAndTerm makes the difference. Likely, the solution will (as you commented on the other issue) serialize the handling of view updates. Otherwise, we calculate the effect of the view change without a final result in place. In this example, if the voting thread is pending, we have no_change, but if the thread has finished, it is leader_lost..

Since we can't delay the view processing, we can chain a CompletableFuture with a dedicated single-threaded executor for handling the events in order. We'll need some updates on the voting process to complete the future and notify the chain. I would avoid having a thread polling a queue since view changes are not supposed to occur frequently. Changes on the test side are necessary when checking if the voting thread is running since it would run asynchronously and not after processing the view.

jabolina avatar Jun 17 '24 12:06 jabolina

I think just make sure the voting thread has processed the latest view before it's stopped, and the view change event thread don't try to stop the voting thread because there is no guarantee. I just made a pull request, hope it will give you some idea.

yfei-z avatar Jun 18 '24 09:06 yfei-z

Since unset the leader only depend on the view change event of lost majority, so if a participant node got the delayed elected leader message after the view change event then it will keep the leader without majority reached. After the PR, the coordinator works fine but didn't cover this scenario.

@Test
void bug4() throws Exception {
	String clusterName = "node-cluster";
	List<RaftNode> nodes = raftChannels("A,B,C,D,E", null, t -> configProtocol(t, "raft.ELECTION", ELECTION.class))
			.stream().map(RaftNode::new).toList();
	try (var a = nodes.get(0); var b = nodes.get(1); var c = nodes.get(2)) {

		ELECTION election = a.getCh().stack().findProtocol(ELECTION.class);
		election.getSendLeader().block(); // block the election thread before apply the elected leader

		a.getCh().connect(clusterName); a.untilCoord(3); // A become the coordinator
		b.getCh().connect(clusterName);
		c.getCh().connect(clusterName); // majority reached, start a voting thread
		election.getSendLeader().untilWaiters(1, 3); // election thread has being blocked

		c.getCh().disconnect(); // C left
		a.untilMembers(2, 3); // view changed in A
		b.untilMembers(2, 3); // view changed in B

		election.getSendLeader().unblock(); // unblock the election thread

		// before https://github.com/jgroups-extras/jgroups-raft/pull/284
//		assertEquals("A", a.untilLeader(3)); // TODO A become the leader without majority reached
//		assertThrows(TimeoutException.class, () -> a.setAsync("cmd1".getBytes()).get(3, SECONDS));

		election.untilVotingThreadStop(3);
		assertNull(a.leader());
		assertThrows(RaftLeaderException.class, () -> a.setAsync("cmd1".getBytes()).get()).printStackTrace();

		b.untilLeader("A", 3); // TODO got electedLeader from A after view change event, miss the chance to remove the leader
		assertThrows(ExecutionException.class, () -> b.setAsync("cmd1".getBytes()).get()).printStackTrace();
	}
}

yfei-z avatar Jun 19 '24 04:06 yfei-z