raft-rs
raft-rs copied to clipboard
re-joining after simulated node crash panics on trying to re-add an already existing node.
Describe the bug
Re-joining a leader to a cluster crashes with the attempt to re-add already known node to the progress state.
Oct 31 15:49:52.769 ERRO e: The node 2 already exists in the voters set., raft_id: 1
0: backtrace::backtrace::libunwind::trace
at /Users/vsts/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.34/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /Users/vsts/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.34/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:47
3: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:36
4: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:200
5: std::panicking::default_hook
at src/libstd/panicking.rs:214
6: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:477
7: std::panicking::continue_panic_fmt
at src/libstd/panicking.rs:384
8: rust_begin_unwind
at src/libstd/panicking.rs:311
9: core::panicking::panic_fmt
at src/libcore/panicking.rs:85
10: core::result::unwrap_failed
at src/libcore/result.rs:1084
11: core::result::Result<T,E>::unwrap
at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/libcore/result.rs:852
12: uring::raft_node::RaftNode::on_ready
at src/raft_node.rs:325
13: uring::loopy_thing
at src/main.rs:635
14: uring::main::{{closure}}
at src/main.rs:693
To Reproduce
I set up a mini demo to replicate:
- clone https://github.com/wayfair-incubator/uring/tree/2416031ac34759f002a9a1539b5a2a54bbd84946
- start node 1:
cargo run -- -e 127.0.0.1:8081 -i 1 - wait for it to elect itself leader
- start node 2:
cargo run -- -e 127.0.0.1:8082 -i 2 -p 127.0.0.1:8081 - wait for it to join the cluster
- start node 3:
cargo run -- -e 127.0.0.1:8083 -i 3 -p 127.0.0.1:8081 - wait for it to join the cluster.
- Terminate node 1 (leader)
CTRL+C - node 2 or 3 will become leader
- restart node 1 and let it re-join the cluster
cargo run -- -e 127.0.0.1:8081 -i 01 -p 127.0.0.1:8082
Expected behavior Stopping and starting nodes in a cluster should be handled gracefully
System information (probably not relevant)
- CPU architecture: x86
- Distribution and kernel version: OS X 10.14.6
- SELinux on?: No
- Any other system details we should know?: no
Additional context The shared repo is a minimal demo app trying to put raft-rs into a re-state for a reft cluster.
/cc @darach
If a node has joined in a cluster, you should just restart it without sending re-join msg. @Licenser
There is no rejoin message sent, that’s the odd issue about it. And node 1 restarts but it claims that node 2 (one node) is readded
The node 2 and node 3 both panicked directly when I tried to reproduce the scenario in step 8:
Nov 01 00:24:56.636 ERRO e: The node 3 already exists in the voters set., raft_id: 2
Nov 01 00:24:56.648 ERRO e: The node 3 already exists in the voters set., raft_id: 3
Maybe there is something wrong in the state machine 🤔️.
I am baffled, I’ve been dancing around this for a day tried both the 0.6.0 crate and git. The error changes slightly if I don’t persist the ConfState but that feels wrong and causes other issues
After some digging I noticed that logs are not replicated to joining nodes. So joining 2 and the 3 they do share the same snapshot but they don't share the same logs - could that be where the error originates from?
@Hoverbear PTAL
Hmm...
This is typically caused by one of these:
https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/progress/progress_set.rs#L301-L341
You can see when a node is removed from the ProgressSet we remove it from the voter set:
https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/progress/progress_set.rs#L347-L355
That's called by remove_node here:
https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/raft.rs#L2104-L2123
Which is called via apply_conf_change here:
https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/raw_node.rs#L324-L340
I see you're updating it:
https://github.com/wayfair-incubator/uring/blob/a0a5ffa2990e2e8c91f9849cd7eefdd082501543/src/raft_node.rs#L333-L345
I need to play with this some :)
Thanks taking the time to provide all the context!
The assumption made is that add and remove node via a conf change event is for orderly growing or shrinking the raft group not intermittent failure. So currently there is no code path in which we send a remove-node conf change event.
So basically in the code we call add_node only when a new node joining the cluster (on the simplest possible algorithm of pre-defined/shared node id's and if they're known or not), we didn't got to removing nodes again yet since we're still battling the restart case.
So some preliminary tests:
- Online 1, Election (L1), Online 2 & 3, Pause, Down 2 or 3, no issues. Rejoin is fine.
- Online 1, Election (L1), Online 2 & 3, Pause, Down 1, panic.
trace:
Nov 06 10:53:55.571 DEBG Sending from 2 to 1, msg: msg_type: MsgHeartbeatResponse to: 1, to: 1, from: 2, raft_id: 2
[WS Onramp] Connection terminated.
system stopped
Error: Failed to connect to host: Connection refused (os error 111)
Nov 06 10:53:55.745 WARN down(local), id: 1
Nov 06 10:53:57.321 INFO starting a new election, term: 2, raft_id: 2
Nov 06 10:53:57.321 INFO became pre-candidate at term 2, term: 2, raft_id: 2
send raft message to 1 fail, let Raft retry it
Nov 06 10:53:57.322 INFO 2 received message from 2, term: 2, msg: MsgRequestPreVote, from: 2, id: 2, raft_id: 2
Nov 06 10:53:57.323 INFO [logterm: 2, index: 5] sent request to 1, msg: MsgRequestPreVote, term: 2, id: 1, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.324 DEBG Sending from 2 to 1, msg: msg_type: MsgRequestPreVote to: 1 term: 3 log_term: 2 index: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.325 INFO [logterm: 2, index: 5] sent request to 3, msg: MsgRequestPreVote, term: 2, id: 3, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.326 DEBG Sending from 2 to 3, msg: msg_type: MsgRequestPreVote to: 3 term: 3 log_term: 2 index: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.327 DEBG State transition, next-state: PreCandidate, last-state: Follower
Nov 06 10:53:57.332 INFO received from 3, term: 2, msg type: MsgRequestPreVoteResponse, from: 3, raft_id: 2
Nov 06 10:53:57.333 DEBG reset election timeout 18 -> 16 at 0, election_elapsed: 0, timeout: 16, prev_timeout: 18, raft_id: 2
Nov 06 10:53:57.333 INFO became candidate at term 3, term: 3, raft_idsend raft message to 1 fail, let Raft retry it
: 2
Nov 06 10:53:57.334 INFO 2 received message from 2, term: 3, msg: MsgRequestVote, from: 2, id: 2, raft_id: 2
Nov 06 10:53:57.335 INFO [logterm: 2, index: 5] sent request to 1, msg: MsgRequestVote, term: 3, id: 1, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.336 DEBG Sending from 2 to 1, msg: msg_type: MsgRequestVote to: 1 term: 3 log_term: 2 index: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.337 INFO [logterm: 2, index: 5] sent request to 3, msg: MsgRequestVote, term: 3, id: 3, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.338 DEBG Sending from 2 to 3, msg: msg_type: MsgRequestVote to: 3 term: 3 log_term: 2 index: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.339 DEBG State transition, next-state: Candidate, last-state: PreCandidate
Nov 06 10:53:57.344 INFO received from 3, term: 3, msg type: MsgRequestVoteResponse, from: 3, raft_id: 2
Nov 06 10:53:57.345 DEBG reset election timeout 16 -> 19 at 0, election_elapsed: 0, timeout: 19, prev_timeout: 16, raft_id: 2
Nov 06 10:53:57.345 INFO became leader at term 3, [src/raft_node.rs:434] raft_group.raft.raft_log.last_index() + 1 = 7
term: 3, raft_id: 2
Nov 06 10:53:57.345 DEBG Sending from 2 to 1, msg: msg_type: MsgAppend to: 1 log_term: 2 index: 5 entries {term: 3 index: 6} commit: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.346 DEBG Sending from 2 to 3, msg: msg_type: MsgAppend to: 3 log_term: 2 index: 5 entries {term: 3 index: 6} commit: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.347 DEBG State transition, next-state: Leader, last-state: Candidate
send raft message to 1 fail, let Raft retry it
Nov 06 10:53:57.409 DEBG committing index 6, index: 6, raft_id: 2
Nov 06 10:53:57.409 DEBG Sending from 2 to 3, msg: msg_type: MsgAppend to: 3 log_term: 3 index: 6 entries {entry_type: EntryConfChange term: 3 index: 7 data: "\030\003"} commit: 6, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.458 DEBG committing index 7, index: 7, raft_id: 2
Nov 06 10:53:57.458 DEBG Sending from 2 to 3[src/raft_node.rs:343] &cc = node_id: 3
, msg: msg_type: MsgAppend to: 3 log_term: 3 index: 7 commit: 7, tothread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Exists(3, "voters")', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
: 3, from: 2, raft_id: 2
Nov 06 10:53:57.459 DEBG adding node (learner: false) with ID 3 to peers., id: 3, learner: false, raft_id: 2
Nov 06 10:53:57.460 DEBG Inserting voter with id 3, id: 3, raft_id: 2
Nov 06 10:53:57.461 ERRO e: The node 3 already exists in the voters set., raft_id: 2
thread 'actix-rt:worker:1' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1165:5
thread 'actix-rt:worker:1' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1165:5
stack backtrace:
0: 0x56294b85ced4 - backtrace::backtrace::libunwind::trace::hf9ad636648efb226
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: 0x56294b85ced4 - backtrace::backtrace::trace_unsynchronized::hed853f204024d151
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: 0x56294b85ced4 - std::sys_common::backtrace::_print_fmt::h540be03650997d7f
at src/libstd/sys_common/backtrace.rs:77
3: 0x56294b85ced4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h859895b5f65feccf
at src/libstd/sys_common/backtrace.rs:61
4: 0x56294b8805ac - core::fmt::write::h55a2a37a9ae04dad
at src/libcore/fmt/mod.rs:1028
5: 0x56294b857d87 - std::io::Write::write_fmt::h55b418b8331f602e
at src/libstd/io/mod.rs:1412
6: 0x56294b85f44e - std::sys_common::backtrace::_print::h01cf1aa97e3bff6e
at src/libstd/sys_common/backtrace.rs:65
7: 0x56294b85f44e - std::sys_common::backtrace::print::h992feaba39b9b6be
at src/libstd/sys_common/backtrace.rs:50
8: 0x56294b85f44e - std::panicking::default_hook::{{closure}}::hf519e5920434d96d
at src/libstd/panicking.rs:188
9: 0x56294b85f141 - std::panicking::default_hook::ha5560763526d8f2a
at src/libstd/panicking.rs:205
10: 0x56294b85fb4b - std::panicking::rust_panic_with_hook::h23e3b045936d3e0a
at src/libstd/panicking.rs:464
11: 0x56294b85f6ee - std::panicking::continue_panic_fmt::h22cad8a6b4a9a888
at src/libstd/panicking.rs:373
12: 0x56294b85f5d6 - rust_begin_unwind
at src/libstd/panicking.rs:302
13: 0x56294b87c4ce - core::panicking::panic_fmt::h9f376b306ccc647c
at src/libcore/panicking.rs:139
14: 0x56294b87c5c7 - core::result::unwrap_failed::ha857e2e239846901
at src/libcore/result.rs:1165
15: 0x56294a38d6d4 - core::result::Result<T,E>::unwrap::h2b62ba69aa6e5d84
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/result.rs:933
16: 0x56294a3e2a1a - <uring::UrSocket as actix::actor::Actor>::stopped::h39bc887b77fa9f3d
at src/main.rs:129
17: 0x56294a31dec8 - <actix::contextimpl::ContextFut<A,C> as futures::future::Future>::poll::h20f0330cfe0a782f
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-0.8.3/src/contextimpl.rs:438
18: 0x56294a31c705 - <actix::contextimpl::ContextFut<A,C> as core::ops::drop::Drop>::drop::h7575ff147dd421c0
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-0.8.3/src/contextimpl.rs:220
19: 0x56294a36ad01 - core::ptr::real_drop_in_place::h0f7834c8d060f1e5
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
20: 0x56294a36b521 - core::ptr::real_drop_in_place::h15157f2273174a4a
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
21: 0x56294a382eb1 - core::ptr::real_drop_in_place::hfd3912dcfbeb53e5
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
22: 0x56294a379348 - core::ptr::real_drop_in_place::h9bdb1342952ae961
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
23: 0x56294a3715bb - core::ptr::real_drop_in_place::h4cd96d3c8d8904ab
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
24: 0x56294a377cd5 - core::ptr::real_drop_in_place::h89d944f4395f2954
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
25: 0x56294a380383 - core::ptr::real_drop_in_place::he465ebf8b5d81fc4
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
26: 0x56294a36e565 - core::ptr::real_drop_in_place::h303c9666062b1852
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
27: 0x56294a373439 - core::ptr::real_drop_in_place::h639d6c80397d3994
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
28: 0x56294a37c476 - core::ptr::real_drop_in_place::hbd286707c9379e36
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
29: 0x56294a369842 - core::ptr::real_drop_in_place::h050601c6fa9adbb3
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
30: 0x56294a382e91 - core::ptr::real_drop_in_place::hfd1b973e3f3405b7
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
31: 0x56294a37d47f - core::ptr::real_drop_in_place::hc6257ba62dd3ee4e
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
32: 0x56294a3800a1 - core::ptr::real_drop_in_place::he21e27ce352065f6
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
33: 0x56294a37ca48 - core::ptr::real_drop_in_place::hbfdd5e57c1ea6a8a
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
34: 0x56294a36fd81 - core::ptr::real_drop_in_place::h3e1b516237dc7899
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
35: 0x56294a36f1f4 - core::ptr::real_drop_in_place::h384c3f60fbd1cfd3
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
36: 0x56294aeb48b8 - core::ptr::real_drop_in_place::h25dd0baed7f0f375
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
37: 0x56294aeb4fe1 - core::ptr::real_drop_in_place::hdd6aac9bbf1b25da
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
38: 0x56294aeb4851 - core::ptr::real_drop_in_place::h1bc710e61ac51dd8
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
39: 0x56294aeb488f - core::ptr::real_drop_in_place::h20559ea5933991c4
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
40: 0x56294aeb6d47 - core::mem::drop::h7804235b62a558b2
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/mem/mod.rs:704
41: 0x56294ae10035 - tokio_current_thread::scheduler::release_node::h72a3b1970158c2e8
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:382
42: 0x56294ae0ff4f - <tokio_current_thread::scheduler::Scheduler<U>::tick::Bomb<U> as core::ops::drop::Drop>::drop::{{closure}}::hf313f7599952abe0
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:271
43: 0x56294ae0cbe4 - tokio_current_thread::Borrow<U>::enter::{{closure}}::{{closure}}::h93152cf040b0cfc6
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:788
44: 0x56294ae0c593 - tokio_current_thread::CurrentRunner::set_spawn::hb299958eebe1449b
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:825
45: 0x56294ae0ca85 - tokio_current_thread::Borrow<U>::enter::{{closure}}::h8bb87fc431b3369e
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:788
46: 0x56294adf4b1a - std::thread::local::LocalKey<T>::try_with::h4b4da45cb0953780
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
47: 0x56294adf3568 - std::thread::local::LocalKey<T>::with::hbe6f12a20964b981
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
48: 0x56294ae0c61b - tokio_current_thread::Borrow<U>::enter::h0d6f514f2853732f
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:786
49: 0x56294ae0feb7 - <tokio_current_thread::scheduler::Scheduler<U>::tick::Bomb<U> as core::ops::drop::Drop>::drop::h60e904c13e6346e3
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:271
50: 0x56294ae066a5 - core::ptr::real_drop_in_place::h87d3db269008cc8c
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
51: 0x56294ae11879 - tokio_current_thread::scheduler::Scheduler<U>::tick::h8ba9f0e91fe9beb0
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:342
52: 0x56294ae0ce2d - tokio_current_thread::Entered<P>::tick::h015233dbf859aa87
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:612
53: 0x56294ae0cff0 - tokio_current_thread::Entered<P>::block_on::h5c966374bfc009d6
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:502
54: 0x56294ae0168e - actix_rt::runtime::Runtime::block_on::{{closure}}::ha255279c9e9e0678
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:128
55: 0x56294ae01d61 - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::{{closure}}::{{closure}}::h71c41cadaf16c741
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:168
56: 0x56294ae2851d - tokio_executor::global::with_default::{{closure}}::hf2f1bdd7887fd578
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:209
57: 0x56294adf6448 - std::thread::local::LocalKey<T>::try_with::heda6599f2047da06
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
58: 0x56294adf348c - std::thread::local::LocalKey<T>::with::hb8098ea981ebb349
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
59: 0x56294ae280a2 - tokio_executor::global::with_default::hc1252736b4e426c1
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:178
60: 0x56294ae01f7e - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::{{closure}}::h9de40e2003eada88
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:166
61: 0x56294ae19dc2 - tokio_timer::timer::handle::with_default::{{closure}}::h6e3dd5bd04bbe4cc
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/handle.rs:101
62: 0x56294adf3a28 - std::thread::local::LocalKey<T>::try_with::h04334be1dd5a6501
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
63: 0x56294adf2c1c - std::thread::local::LocalKey<T>::with::h1110cb2dd7a5e41a
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
64: 0x56294ae19955 - tokio_timer::timer::handle::with_default::h6e7191e9227a3395
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/handle.rs:84
65: 0x56294ae020a9 - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::hcb4ffcbdedde3879
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:159
66: 0x56294adf0df8 - tokio_timer::clock::clock::with_default::{{closure}}::h4d2baf6a8640417e
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/clock/clock.rs:137
67: 0x56294adf4844 - std::thread::local::LocalKey<T>::try_with::h44e90ccbce8ba7a0
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
68: 0x56294adf38ad - std::thread::local::LocalKey<T>::with::hf8efabeded619c12
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
69: 0x56294adf0a7e - tokio_timer::clock::clock::with_default::h744d246f9ee1c3c7
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/clock/clock.rs:117
70: 0x56294ae020fd - actix_rt::runtime::Runtime::enter::{{closure}}::h25beeae16b0aab2b
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:158
71: 0x56294ae1a53f - tokio_reactor::with_default::{{closure}}::h047eea81de149d26
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:237
72: 0x56294adf5544 - std::thread::local::LocalKey<T>::try_with::h983b47c211a53258
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
73: 0x56294adf2b9d - std::thread::local::LocalKey<T>::with::h0ddb474724c356fd
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
74: 0x56294ae1a318 - tokio_reactor::with_default::hedeceb1f1204f41e
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:217
75: 0x56294ae0196c - actix_rt::runtime::Runtime::enter::he0a8a92e295c18e4
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:157
76: 0x56294ae01478 - actix_rt::runtime::Runtime::block_on::h5a965072a66504f6
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:126
77: 0x56294ae26303 - actix_rt::arbiter::Arbiter::new::{{closure}}::h03e49a8780bdb49d
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/arbiter.rs:113
78: 0x56294ae1f0f5 - std::sys_common::backtrace::__rust_begin_short_backtrace::h11b303417e3a8564
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/sys_common/backtrace.rs:129
79: 0x56294ae040c1 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::he7652596c7722712
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/mod.rs:469
80: 0x56294adf6664 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h8613466756385bec
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panic.rs:317
81: 0x56294adf1911 - std::panicking::try::do_call::h711ea6f48f6c4718
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panicking.rs:287
82: 0x56294b86414a - __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:78
83: 0x56294adf1768 - std::panicking::try::h7cbe878e6ead05a2
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panicking.rs:265
84: 0x56294adf69d6 - std::panic::catch_unwind::hf74332f300029515
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panic.rs:396
85: 0x56294ae03eb2 - std::thread::Builder::spawn_unchecked::{{closure}}::h7970385913689ac2
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/mod.rs:468
86: 0x56294ae04434 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf90741fee7c7944d
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ops/function.rs:227
87: 0x56294b85100f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hab0345739862a0e3
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/liballoc/boxed.rs:942
88: 0x56294b863210 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h97269b0dcf5f4700
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/liballoc/boxed.rs:942
89: 0x56294b863210 - std::sys_common::thread::start_thread::h6d1cb01c00583451
at src/libstd/sys_common/thread.rs:13
90: 0x56294b863210 - std::sys::unix::thread::Thread::new::thread_start::h2c73c5e5c4fa31f9
at src/libstd/sys/unix/thread.rs:79
91: 0x7f4cd68f4669 - start_thread
92: 0x7f4cd6800323 - clone
93: 0x0 - <unknown>
thread panicked while panicking. aborting.
Illegal instruction
Seems this bit is what's important:
Nov 06 10:53:57.459 DEBG adding node (learner: false) with ID 3 to peers., id: 3, learner: false, raft_id: 2
Nov 06 10:53:57.460 DEBG Inserting voter with id 3, id: 3, raft_id: 2
Nov 06 10:53:57.461 ERRO e: The node 3 already exists in the voters set., raft_id: 2
Ja, which is odd, the only time we fire a AddNode event if the raft state says it's an unknown node: https://github.com/wayfair-incubator/uring/blob/master/src/main.rs#L557 since node 1 did see 2 and 3 before and persists the state to disk it should never get there - I don't see any add-node messages fire either.
Hmm, so ConfChange should normally come from Raft log entries. This is the replicated log, so 'injecting' an entry is inherently unsafe. I suggest you remove that code. (https://github.com/wayfair-incubator/uring/blob/a0a5ffa2990e2e8c91f9849cd7eefdd082501543/src/main.rs#L557-L584) :)
If you want to add a node you can try in a way like:
https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/examples/five_mem_node/main.rs#L405-L419
This is the correct way to add something to the log.
But I'm not sure you want to try to do that in this situation. :) Just commenting out that code seems to remove the panic...
It does seem like the remaining followers fail to correctly hold a new election though. I'm going to let you take a look and let me know if this helps. :)
Also: I'm sorry the docs aren't so clear. :( We are trying to improve it and would definitely love any concrete issues/prs with improvements.
No reason to be sorry, I think everyone who has ever worked on a project of any kind of complexity can understand, it’s hard to document what you know since everything is obvious at that point ;)
To the code, we started with a direct copy but had to add the second change even or the other nodes didn’t learn about the first node. But the method is even after that still the same - it gets published over a proposal and agreed on not injected in the logs. But I’ll try to remove / replace the code later perhaps the issue it was added for originally was a different one. That said it’s a bit different from the memory nodes since they all exist in their own world and are networked do we can’t just join 5 nudes at boot time, they got to grow as they connect
You should definitely be able to grow/shrink your Raft cluster online. :)
So I tried the suggestion of commenting out the code, but that just doesn't crash because it never creates a group. Node 1 knows itself, as a voter, but neither node 2 or three know of any other node
Node1
Nov 06 21:26:57.192 DEBG NODE STATE, remote-mailboxes: [2, 3], local-mailboxes: [], randomized-election-timeout: 15, election-elapsed: 4, pass-election-timeout: false, promotable: true, voters: {1}, votes: {}, vote: 1, last-index: 2, first-index: 2, term: 2, leader-id: 1, role: Leader, node-id: 1
Node2
Nov 06 21:27:13.903 DEBG NODE STATE, remote-mailboxes: [3], local-mailboxes: [1], randomized-election-timeout: 12, election-elapsed: 565, pass-election-timeout: true, promotable: false, voters: {}, votes: {}, vote: 0, last-index: 0, first-index: 1, term: 0, leader-id: 0, role: Follower, node-id: 2
Node3
Nov 06 21:26:24.032 DEBG NODE STATE, remote-mailboxes: [], local-mailboxes: [2, 1], randomized-election-timeout: 14, election-elapsed: 93, pass-election-timeout: true, promotable: false, voters: {}, votes: {}, vote: 0, last-index: 0, first-index: 1, term: 0, leader-id: 0, role: Follower, node-id: 3
I'll try to make an explicit 'add node' action that isn't connected to the join on the clsuter perhaps that works better
So I changed how chance events are send and made it explicit. (https://github.com/wayfair-incubator/uring/commit/a791b33008faa690bc6d643b9cf223db9852bbb1)
That shifts the problem slightly, it now goes to (after restarring node 1 (initial leade), and node 3 (new leader) - on node 1):
Nov 06 21:43:09.803 WARN down(local), id: 3
thread '<unnamed>' panicked at 'slice[6,6] out of bound[2,4], raft_id: 1', /Users/heinz/.cargo/git/checkouts/raft-rs-42b8049ef2e3af07/2ce67a0/src/raft_log.rs:420:13
which is a bit odd since none of the nodes has 6 as a index.
1 (before it crashed)
Nov 06 21:43:01.487 DEBG NODE STATE, remote-mailboxes: [], local-mailboxes: [3, 2], randomized-election-timeout: 10, election-elapsed: 2, pass-election-timeout: false, promotable: true, voters: {1, 2, 3}, votes: {}, vote: 0, last-index: 4, first-index: 2, term: 3, leader-id: 3, role: Follower, node-id: 1
2
Nov 06 21:44:58.602 DEBG NODE STATE, remote-mailboxes: [], local-mailboxes: [], randomized-election-timeout: 10, election-elapsed: 6, pass-election-timeout: false, promotable: true, voters: {1, 2, 3}, votes: {2: true}, vote: 3, last-index: 5, first-index: 4, term: 3, leader-id: 0, role: PreCandidate, node-id: 2
3
Nov 06 21:43:02.790 DEBG NODE STATE, remote-mailboxes: [1], local-mailboxes: [2], randomized-election-timeout: 12, election-elapsed: 3, pass-election-timeout: false, promotable: true, voters: {1, 2, 3}, votes: {}, vote: 3, last-index: 5, first-index: 5, term: 3, leader-id: 3, role: Leader, node-id: 3
Hmmm, that's a bit of a puzzle isn''t it!
I see that index 5 is done:
Nov 07 12:17:02.582 INFO [commit: 5, term: 3] restored snapshot [index: 5, term: 3], snapshot_term: 3, snapshot_index: 5, commit: 5, term: 3, raft_id: 1
Yup, but only on node 2 & 3, node 2 is still at 4 - I suspect 5 was the election of the new leader when 1 was killed so it makes sense that it doesn't have it yet. I'm still unsure where it gets 6 from
Yup I misread and deleted the comment :-P
:D ah sorry, I just saw the mail.
@Licenser If you don't call curl -X POST 127.0.0.1:8081/node/1 in your reproduction case it seems to work without panicking... At least raft is trying to send the correct messages but it seems to be unable to.

Thanks! That's an interesting observation. I didn't try that yet - it's a bit late here (close to midnight) but I'll give that a try tomorrow and keep digging.
The failing to send message might be because the node was booted without knowing it's peers that's a easy to fix thing (famous last words) at least easy to work around for the same of getting the raft-rs use case working enough to have a minimal functioning kv store in it by just passing all peers to all nodes with more -p in the command line (obviously not a production grade solution 😂)
Yeah, let's get this working! :) Would love to see more raft-rs derivatives and this is informing how we can better document things.
(Un)Fortunately we're updating our membership change mechanisms (hello Joint Consensus) hopefully soon once we resolve some lingering safety issues with the etcd folks.
If you need help learning to use Raft @luciofranco made a simple KV here: https://github.com/LucioFranco/kv, and the TiKV Raftstore is a maximal example: https://github.com/tikv/tikv/tree/master/src/raftstore
Learning raft isn't really my problem I'm not unfamiliar with it. Working through the assumptions in raft-rs is where I am stuck :(.
I looked at @LucioFranco and as far as I can tell it suffers from the same problems.
bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set."
I gave not adding node 1 a try and it ends with the same issue of trying to access index 6, it just takes one more kill to get there, (start1, start2, start3, join2, join3, kill1, start1, kill2(was leader), join2 => tries to access index 6)
So I've come full circle with this. The original 'node already known problem' back where I am. The solution to that was to not load the ConfState from disk when restarting a node. That on the other hand causes nodes to forget their peers so restarting 1 it only comes up with voter 2, 3 (but not itself), restarting 2, and 2 only has 3 as voter but neither 1 or itself.
More general it seems that nodes only know about nodes added after them, so node 1 after a restart knows about 2/3, node 2 after a restart knows about 3, node 3 after a restart knows none.
bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set."
At the join 1 step, why are you doing this? 1 was already added? Once a node is added to the set, it should only be removed via an explicit Remove. As you know from Raft the leader (2 or 3) will continually try to reconnect to the lost node (1), once they manage to hear back, the node is recovered. It never needs to be removed unless you've actually explicitly given up on that node.
Here's how the process should go:
- Create & start Raft node 1, it becomes leader.
- Create Raft node 2 & 3 and idle them.
- Inform Leader (1) of node 2 & 3 through an
AddNodeproposal. Let it run and hold an election. - The cluster is lively.
- Kill 1
- 2/3 hold an election, one wins.
- 1 recovers, begins listening and ticking it's election timeout.
- Leader (2/3) either contacts it or hears from it and captures it. (the other follower would just redirect it to the current leader)
- The cluster is lively again.
So you'd do remove/add when you were decommisioning a host, and if you were just taking a host down for maintenance or a reboot, you can just leave it in the membership pool. :)
bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set."
At the
join 1step, why are you doing this? 1 was already added? Once a node is added to the set, it should only be removed via an explicit Remove. As you know from Raft the leader (2 or 3) will continually try to reconnect to the lost node (1), once they manage to hear back, the node is recovered. It never needs to be removed unless you've actually explicitly given up on that node.
kv needs to be called with either join or bootstrap there is no other way to start it, I'd rather not call either but that isn't an option.
Here's how the process should go:
- Create & start Raft node 1, it becomes leader.
- Create Raft node 2 & 3 and idle them.
- Inform Leader (1) of node 2 & 3 through an
AddNodeproposal. Let it run and hold an election.- The cluster is lively.
- Kill 1
- 2/3 hold an election, one wins.
- 1 recovers, begins listening and ticking it's election timeout.
This is where it gets problematic, after recovery the nodes either run into a "node already known" problem when the ConfState is persisted to disk and re-loaded or it forgets about the nodes. That's where I'm stuck at the moment.
- Leader (2/3) either contacts it or hears from it and captures it. (the other follower would just redirect it to the current leader)
- The cluster is lively again.
So you'd do remove/add when you were decommisioning a host, and if you were just taking a host down for maintenance or a reboot, you can just leave it in the membership pool. :)
I know, there are no calls to add outside of the PUT node/X and no calls to remove at all. That's the mystery.
This is where it gets problematic, after recovery the nodes either run into a "node already known" problem when the ConfState is persisted to disk and re-loaded or it forgets about the nodes. That's where I'm stuck at the moment.
So when 1 recovers and it loads from the hard state, it should already know about 1,2,3 already no? Why is it adding nodes?