akka-raft icon indicating copy to clipboard operation
akka-raft copied to clipboard

Nodes should not forget who they voted for

Open colin-scott opened this issue 9 years ago • 11 comments

When a Candidate receives an AppendEntries message from a leader in the same term, it correctly steps down to Follower.

However, it currently forgets who it voted for that term.

This can lead to two leaders being elected in the same term: the Candidate that stepped down may now vote for another Candidate who will eventually become the second leader.

This behavior can occur even after merging in #55.

I have a test case that triggers this behavior. Here are the messages delivered to trigger this bug [format is sender,receiver,message, and the cluster has 4 nodes in it]:

MsgEvent(deadLetters,raft-member-4,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-2, raft-member-3, raft-member-4))))
MsgEvent(deadLetters,raft-member-1,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-2, raft-member-3, raft-member-4))))
MsgEvent(deadLetters,raft-member-1,Timer(election-timer,ElectionTimeout,false,1))
MsgEvent(deadLetters,raft-member-4,Timer(election-timer,ElectionTimeout,false,1))
MsgEvent(deadLetters,raft-member-3,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-2, raft-member-3, raft-member-4))))
MsgEvent(deadLetters,raft-member-1,Timer(election-timer,ElectionTimeout,false,3))
MsgEvent(deadLetters,raft-member-4,Timer(election-timer,ElectionTimeout,false,3))
MsgEvent(raft-member-4,raft-member-4,BeginElection)
MsgEvent(deadLetters,raft-member-2,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-2, raft-member-3, raft-member-4))))
MsgEvent(raft-member-4,raft-member-1,RequestVote(Term(2),Actor[akka://new-system-0/user/raft-member-4#-1663394226],Term(0),0))
MsgEvent(deadLetters,raft-member-3,Timer(election-timer,ElectionTimeout,false,1))
MsgEvent(deadLetters,raft-member-2,Timer(election-timer,ElectionTimeout,false,1))
MsgEvent(deadLetters,raft-member-3,Timer(election-timer,ElectionTimeout,false,3))
MsgEvent(raft-member-3,raft-member-3,BeginElection)
MsgEvent(raft-member-3,raft-member-3,BeginElection)
MsgEvent(raft-member-4,raft-member-4,BeginElection)
MsgEvent(deadLetters,raft-member-2,Timer(election-timer,ElectionTimeout,false,3))
MsgEvent(raft-member-1,raft-member-4,VoteCandidate(Term(2)))
MsgEvent(raft-member-4,raft-member-2,RequestVote(Term(2),Actor[akka://new-system-0/user/raft-member-4#-1663394226],Term(0),0))
MsgEvent(raft-member-2,raft-member-4,VoteCandidate(Term(2)))
MsgEvent(raft-member-4,raft-member-4,ElectedAsLeader)
MsgEvent(raft-member-4,raft-member-1,AppendEntries(term:Term(2),prevLog:(Term(1),1),entries:List()))
MsgEvent(raft-member-3,raft-member-1,RequestVote(Term(2),Actor[akka://new-system-0/user/raft-member-3#-463317739],Term(0),0))
MsgEvent(raft-member-4,raft-member-2,AppendEntries(term:Term(2),prevLog:(Term(1),1),entries:List()))
MsgEvent(raft-member-1,raft-member-3,VoteCandidate(Term(2)))
MsgEvent(raft-member-3,raft-member-2,RequestVote(Term(2),Actor[akka://new-system-0/user/raft-member-3#-463317739],Term(0),0))
MsgEvent(raft-member-2,raft-member-3,VoteCandidate(Term(2)))

At this point, both raft-member-3 and raft-member-4 are leaders for Term(2).

Let me know if you want more detailed reproducing steps.

colin-scott avatar Aug 04 '15 19:08 colin-scott

I don't quite understand how two leaders can be elected. When the the candidate that stepped down will start a new election he will increment his term and will start sending requests to all other members with a new term in a request and soon previous leader will discover that his term is out of date and will revert to the follower state. This behavior of leader however is not implemented because now he doesn't revert to follower after receiving new term number.

dmitraver avatar Aug 06 '15 08:08 dmitraver

@dmitraver in this particular test case, when the candidate that stepped down starts a new election, it does indeed increment its term and send requests to all members. However, those requests are delayed indefinitely, so others do not step down immediately.

And, regardless of how long the requests are delayed for: Raft's "LeaderSafety" invariant states that at no point in time, should there be two nodes who both believe themselves to be leader for the same Term. You're quite right this should eventually correct itself, but in the interrim you could have a very bad situation, e.g. the two leaders could overwrite eachother's committed entries, and end up with corrupted state machines by the time the leaders right themselves out.

Does that make sense?

colin-scott avatar Aug 06 '15 10:08 colin-scott

Yep, I think that checking for correct term should be enough to solve this problem because members should also reject AppendEntries if they come from different term. Correct me if I'm wrong. Anyway I will fix the term handling for leader and see what is going on then and as I understand your test can be used to check this behavior. Could you please tell me how can I run it myself?

dmitraver avatar Aug 06 '15 13:08 dmitraver

Hm, I don't understand your point. Members currently do reject AppendEntries if they come from a lower term [see Follower and Candidate].

Anyhow, here are the reproducing steps for the execution:

git clone -b raft-56 [email protected]:NetSys/sts2-applications.git
git clone [email protected]:NetSys/sts2-experiments.git experiments
sbt assembly
java -d64 -Xmx15g -cp target/scala-2.11/randomSearch-assembly-0.1.jar Main

Near the bottom of the console output, you should see:

Violation?: Some(RaftViolation(Map(ElectionSafety:raft-member-1:raft-member-4 -> Set(raft-member-4, raft-member-1))))

Meaning that raft-member-1 and raft-member-4 are leaders for the same term.

I also added println's to RaftActor to show what the state of each actor is before and after each message receive. Look for BEFORE RECEIVE and AFTER RECEIVE markers.

Note that this snapshot of the akka-raft code differs slightly from the master branch -- I wanted to show that this bug can be triggered in isolation from the other issues.

The changes are:

  • I merged your pull request #55 [reject if term < currentTerm, and for all nodes other than leader, step down if term > currentLeader]
  • I added code in Leader.scala to step down if term > currentTerm for RequestVote and VoteCandidate
  • Fix https://github.com/ktoso/akka-raft/issues/29 [allow nodes to reissue votes for the same candidate in the same term] and fix https://github.com/ktoso/akka-raft/issues/45 [distinguish votes from the same follower]

colin-scott avatar Aug 06 '15 19:08 colin-scott

As far as I can tell, the only remaining issue here is with this line of code:

https://github.com/ktoso/akka-raft/blob/master/src/main/scala/pl/project13/scala/akka/raft/Candidate.scala#L67

The Candidate forgets who it voted for when it steps down to Follower, and then votes again in the same term for someone else.

colin-scott avatar Aug 06 '15 19:08 colin-scott

Excellent work Colin. I will investigate your changes on a weekend and will update my pull request accordingly.

dmitraver avatar Aug 07 '15 06:08 dmitraver

For what it's worth, I have a (non-pull-request-worthy) fix for this issue here:

https://github.com/NetSys/sts2-applications/commit/ef443f51569f8167bbb548157dc9cfd312c1bb25

colin-scott avatar Aug 08 '15 03:08 colin-scott

@colin-scott I totally think all those fixes are definitely PR worthy actually! Would you not want to send in PRs with those (I'd like to give credit where it's due for the fixes this way)? Thanks for the continued research/work on it!

ktoso avatar Aug 08 '15 12:08 ktoso

Sure I can send in PRs, I just don't have time right now to write proper unit tests. Do you still want me to send them?

colin-scott avatar Aug 08 '15 14:08 colin-scott

I'd be more than happy to see those small fixes as PRs and would then add more and more tests (best to uncover what this specific PR fixes). I.e. I'd take it upon myself to add tests proving that the patch makes sense as described :)

ktoso avatar Aug 08 '15 14:08 ktoso

OK, sounds good -- I'll do that sometime today

colin-scott avatar Aug 08 '15 14:08 colin-scott