akka-raft
akka-raft copied to clipboard
Two Leaders Elected in the Same Term
Hello,
I have a test case that, as far as I can tell, causes akka-raft to violate Raft's "Election Safety" property (see Figure 3 from the paper), i.e. it appears that two leaders are elected for the same term.
The test case consists of the following external events:
- Start 9 RaftActors, named "raft-member-{1-9}".
- Bootstrap all but one of the RaftActors, i.e. send them
ChangeConfiguration
messages containing ActorRefs for all 9 RaftActors. raft-member-8 does not receive aChangeConfiguration
message.
Upon running the test, I see the following in the console output:
[INFO] [04/24/2015 23:52:24.490] [new-system-0-akka.actor.default-dispatcher-5] [akka://new-system-0/user/raft-member-9] Initializing election (among 9 nodes) for Term(2)
[INFO] [04/24/2015 23:52:24.492] [new-system-0-akka.actor.default-dispatcher-7] [akka://new-system-0/user/raft-member-7] Initializing election (among 9 nodes) for Term(2)
...
[INFO] [04/24/2015 23:52:24.497] [new-system-0-akka.actor.default-dispatcher-5] [akka://new-system-0/user/raft-member-7] Received vote by Actor[akka://new-system-0/user/raft-member-4#-2022599620]; Won election with 5 of 9 votes
[INFO] [04/24/2015 23:52:24.496] [new-system-0-akka.actor.default-dispatcher-5] [akka://new-system-0/user/raft-member-9] Received vote by Actor[akka://new-system-0/user/raft-member-2#-1430451575]; Won election with 5 of 9 votes
For what it's worth, rather than inspecting the console output to detect this bug, we took a distributed snapshot of all RaftActor's states and found that in the same snapshot raft-member-9 is in state
LeaderMeta(Actor[akka://new-system-0/user/raft-member-9],Term(2))
while
raft-member-7 is in state LeaderMeta(Actor[akka://new-system-0/user/raft-member-7],Term(2))
.
In our failing execution, we have the akka runtime deliver 27 total messages, including the 8 ChangeConfiguration
messages. The delivery order is as follows (format is sender,receiver,message):
null,raft-member-7,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-4,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-2,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-1,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-9,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-6,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
raft-member-9,raft-member-9,BeginElection
null,raft-member-3,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4))
raft-member-9,raft-member-1,(RequestVote,Term(1),raft-member-9,Term(0),0)
raft-member-9,raft-member-2,(RequestVote,Term(1),raft-member-9,Term(0),0)
raft-member-9,raft-member-6,(RequestVote,Term(1),raft-member-9,Term(0),0)
raft-member-1,raft-member-9,VoteCandidate(Term(0))
raft-member-6,raft-member-9,VoteCandidate(Term(0))
null,raft-member-5,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
raft-member-9,raft-member-9,BeginElection
raft-member-9,raft-member-3,(RequestVote,Term(2),raft-member-9,Term(0),0)
raft-member-7,raft-member-7,BeginElection
raft-member-7,raft-member-1,(RequestVote,Term(2),raft-member-7,Term(0),0)
raft-member-1,raft-member-1,BeginElection
raft-member-7,raft-member-5,(RequestVote,Term(2),raft-member-7,Term(0),0)
raft-member-7,raft-member-4,(RequestVote,Term(2),raft-member-7,Term(0),0)
raft-member-3,raft-member-9,VoteCandidate(Term(2))
raft-member-7,raft-member-7,BeginElection
raft-member-1,raft-member-7,VoteCandidate(Term(2))
raft-member-5,raft-member-7,VoteCandidate(Term(1))
raft-member-2,raft-member-9,VoteCandidate(Term(0))
raft-member-4,raft-member-7,VoteCandidate(Term(1))
Based on that delivery order, it appears that raft-member-9 receives votes from raft-member-{1,6,3,2,9}, and raft-member-7 receives votes from raft-member-{1,5,4,7}. A few things are strange about this: the votes received by raft-member-9 appear to be from different Terms; and raft-member-7 does not actually receive a quorum of votes (regardless of Term). I'm not exactly sure what the root cause is here.
We made akka's message scheduler deterministic so that you can easily reproduce the bug for yourself. Steps to reproduce:
$ git clone -b raft-leader-safety [email protected]:NetSys/sts2-applications.git
$ cd sts2-applications
$ git remote add interposition [email protected]:NetSys/sts2-interposition.git
$ git subtree pull --prefix=interposition interposition master
$ git clone [email protected]:NetSys/sts2-experiments.git experiments
$ sbt assembly
$ java -d64 -Xmx15g -cp target/scala-2.11/randomSearch-assembly-0.1.jar Main 2>&1 | tee console.out
From there you should be able to add logging statements and continue replaying as many times as needed.
We made a few small changes to akka-raft to generate this test case:
- We use a seeded random number generator to make the execution deterministic.
- We added a receive() override to RaftActor.scala to allow us to take distributed snapshots.
- We modified
Follower.scala to always begin an election
rather than check if
electionDeadline.isOverdue()
. This is again to make the execution deterministic (sinceelectionDeadline.isOverdue()
callsgettimeofday
), but that shouldn't affect correctness as far as I can tell.
Let me know if you have any questions about how we ran this test.
Thanks! -Colin
Excellent work Colin, looks really good. I need to find the time to go over your discovery!
I was able to minimize this test case further, by decreasing the cluster size.
Repro steps for smaller case:
$ git clone -b raft-shrunk-leader-safety [email protected]:NetSys/sts2-applications.git
$ cd sts2-applications
$ git clone [email protected]:NetSys/sts2-experiments.git experiments
$ sbt assembly
$ java -d64 -Xmx15g -cp target/scala-2.11/randomSearch-assembly-0.1.jar Main 2>&1 | tee console.out
Console output.
Messages delivered:
MsgEvent(deadLetters,raft-member-8,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(deadLetters,raft-member-7,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(deadLetters,raft-member-3,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(raft-member-8,raft-member-8,BasicFingerprint(BeginElection))
MsgEvent(raft-member-7,raft-member-7,BasicFingerprint(BeginElection))
MsgEvent(raft-member-8,raft-member-3,BasicFingerprint((RequestVote,Term(1),raft-member-8,Term(0),0)))
MsgEvent(raft-member-7,raft-member-7,BasicFingerprint(BeginElection))
MsgEvent(raft-member-8,raft-member-8,BasicFingerprint(BeginElection))
MsgEvent(deadLetters,raft-member-6,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(raft-member-7,raft-member-3,BasicFingerprint((RequestVote,Term(2),raft-member-7,Term(0),0)))
MsgEvent(raft-member-3,raft-member-8,BasicFingerprint(VoteCandidate(Term(1))))
MsgEvent(raft-member-3,raft-member-3,BasicFingerprint(BeginElection))
MsgEvent(raft-member-8,raft-member-6,BasicFingerprint((RequestVote,Term(1),raft-member-8,Term(0),0)))
MsgEvent(raft-member-6,raft-member-8,BasicFingerprint(VoteCandidate(Term(0))))
MsgEvent(raft-member-3,raft-member-7,BasicFingerprint(VoteCandidate(Term(2))))
MsgEvent(raft-member-7,raft-member-8,BasicFingerprint((RequestVote,Term(2),raft-member-7,Term(0),0)))
At the end of this, raft-member-7 and raft-member-8 are both elected as leader in the same term.
@colin-scott Hi! I tried to run the test case that you provided but it appears that the code is changed and after running
git subtree pull --prefix=interposition interposition master
I get a lot of merge conflicts and I'm unable to run the test. I want to try fixing this issue and probably others too as I'm planning to use this library in my own project.
Hey @dmitraver,
Thanks for pointing out the issue. I corrected the repro steps above; if you start the repro steps from scratch it should work.
(The raft-shrunk-leader-safety
branch already has the correct dependencies, so there shouldn't be a need to pull in the subtree and merge)
Thanks! -Colin
@dmitraver for what it's worth, I think the root cause of this bug is described in this issue:
https://github.com/ktoso/akka-raft/issues/46
@colin-scott Thx Colin! I will look into that.
Thanks guys! Still wasn't able to resume work on it, a lot of stuff on Akka itself needs shipping soon so focused my efforts there :)
One note:
I'm planning to use this library in my own project.
Please don't if it's any real-life / production project – this implementation has been proven to not yet be correct so it would be a bad idea to use in a real project. Safety first!
@ktoso Akka is awesome tool! I just want to contribute to this project :) I've seen a lot of issues was opened previously so will try to fix as much as I can and make a PR. This is not intended to be used in production(maybe later) but more for my own akka studies.
Awesome! For learning things on distributed systems and Akka + fixing things this project is ideal I think :-) Looking forward to hear from you then – feel free to open/ask on issues if you have any trouble,
@ktoso Thanks for support :)
Calling this a "test case" is a bit of a misnomer. Think of it as a replayable execution. It replays by trying to deliver the same (equivalent) messages it observed in the original execution. If some of those messages become missing, e.g. after you modified the akka-raft code, it fails to replay properly. Note though that that doesn't necessarily mean that you fixed the bug.
I can show you how I ran the initial fuzz test if that would be easier. Or I can show you how to run the replayer in a mode that doesn't crash whenever it diverges.
On Thu, Jul 23, 2015 at 5:46 AM Dmitry Avershin [email protected] wrote:
@colin-scott https://github.com/colin-scott Hi Colin! I did some research recently and found that candidate doesn't recreate ElectionMeta when it receives a BeginElection message and therefore at the beginning of new election it has some votes that were received previously(possibly in the other terms too). I tried to change the following line https://github.com/ktoso/akka-raft/blob/master/src/main/scala/pl/project13/scala/akka/raft/Candidate.scala#L31 to
m.forNewElection.incVote
which should solve the issue but then your test starts failing with
Exception in thread "main" akka.dispatch.verification.ReplayException: Expected event (raft-member-9,raft-member-3,BasicFingerprint((RequestVote,Term(2),raft-member-9,Term(0),0))) at akka.dispatch.verification.ReplayScheduler.replay(ReplayScheduler.scala:124) at akka.dispatch.verification.RunnerUtils$.replayExperiment(RunnerUtils.scala:139)
I tried to find what causes this issue but stuck. Could you give me some insight what is going wrong?
— Reply to this email directly or view it on GitHub https://github.com/ktoso/akka-raft/issues/47#issuecomment-124083867.
I think I fixed that issue, now your test shows no errors and passes successfully. The issue was with stale term number in VoteCandidate message. I will make a PR soon. Thanks for your support. Your tests were really helpful :)
2015-07-23 20:35 GMT+02:00 Colin Scott [email protected]:
Calling this a "test case" is a bit of a misnomer. Think of it as a replayable execution. It replays by trying to deliver the same (equivalent) messages it observed in the original execution. If some of those messages become missing, e.g. after you modified the akka-raft code, it fails to replay properly. Note though that that doesn't necessarily mean that you fixed the bug.
I can show you how I ran the initial fuzz test if that would be easier.
On Thu, Jul 23, 2015 at 5:46 AM Dmitry Avershin [email protected] wrote:
@colin-scott https://github.com/colin-scott Hi Colin! I did some research recently and found that candidate doesn't recreate ElectionMeta when it receives a BeginElection message and therefore at the beginning of new election it has some votes that were received previously(possibly in the other terms too). I tried to change the following line < https://github.com/ktoso/akka-raft/blob/master/src/main/scala/pl/project13/scala/akka/raft/Candidate.scala#L31
to
m.forNewElection.incVote
which should solve the issue but then your test starts failing with
Exception in thread "main" akka.dispatch.verification.ReplayException: Expected event (raft-member-9,raft-member-3,BasicFingerprint((RequestVote,Term(2),raft-member-9,Term(0),0))) at akka.dispatch.verification.ReplayScheduler.replay(ReplayScheduler.scala:124) at akka.dispatch.verification.RunnerUtils$.replayExperiment(RunnerUtils.scala:139)
I tried to find what causes this issue but stuck. Could you give me some insight what is going wrong?
— Reply to this email directly or view it on GitHub https://github.com/ktoso/akka-raft/issues/47#issuecomment-124083867.
— Reply to this email directly or view it on GitHub https://github.com/ktoso/akka-raft/issues/47#issuecomment-124202917.
Best wishes, Dmitry Avershin.