raft-rs icon indicating copy to clipboard operation
raft-rs copied to clipboard

re-joining after simulated node crash panics on trying to re-add an already existing node.

Open Licenser opened this issue 6 years ago • 41 comments
trafficstars

Describe the bug

Re-joining a leader to a cluster crashes with the attempt to re-add already known node to the progress state.

Oct 31 15:49:52.769 ERRO e: The node 2 already exists in the voters set., raft_id: 1
   0: backtrace::backtrace::libunwind::trace
             at /Users/vsts/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.34/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /Users/vsts/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.34/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:47
   3: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:36
   4: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:200
   5: std::panicking::default_hook
             at src/libstd/panicking.rs:214
   6: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:477
   7: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:384
   8: rust_begin_unwind
             at src/libstd/panicking.rs:311
   9: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  10: core::result::unwrap_failed
             at src/libcore/result.rs:1084
  11: core::result::Result<T,E>::unwrap
             at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/libcore/result.rs:852
  12: uring::raft_node::RaftNode::on_ready
             at src/raft_node.rs:325
  13: uring::loopy_thing
             at src/main.rs:635
  14: uring::main::{{closure}}
             at src/main.rs:693

To Reproduce

I set up a mini demo to replicate:

  1. clone https://github.com/wayfair-incubator/uring/tree/2416031ac34759f002a9a1539b5a2a54bbd84946
  2. start node 1: cargo run -- -e 127.0.0.1:8081 -i 1
  3. wait for it to elect itself leader
  4. start node 2: cargo run -- -e 127.0.0.1:8082 -i 2 -p 127.0.0.1:8081
  5. wait for it to join the cluster
  6. start node 3: cargo run -- -e 127.0.0.1:8083 -i 3 -p 127.0.0.1:8081
  7. wait for it to join the cluster.
  8. Terminate node 1 (leader) CTRL+C
  9. node 2 or 3 will become leader
  10. restart node 1 and let it re-join the cluster cargo run -- -e 127.0.0.1:8081 -i 01 -p 127.0.0.1:8082

Expected behavior Stopping and starting nodes in a cluster should be handled gracefully

System information (probably not relevant)

  • CPU architecture: x86
  • Distribution and kernel version: OS X 10.14.6
  • SELinux on?: No
  • Any other system details we should know?: no

Additional context The shared repo is a minimal demo app trying to put raft-rs into a re-state for a reft cluster.

Licenser avatar Oct 31 '19 15:10 Licenser

/cc @darach

Licenser avatar Oct 31 '19 15:10 Licenser

If a node has joined in a cluster, you should just restart it without sending re-join msg. @Licenser

Fullstop000 avatar Oct 31 '19 15:10 Fullstop000

There is no rejoin message sent, that’s the odd issue about it. And node 1 restarts but it claims that node 2 (one node) is readded

Licenser avatar Oct 31 '19 15:10 Licenser

The node 2 and node 3 both panicked directly when I tried to reproduce the scenario in step 8:

Nov 01 00:24:56.636 ERRO e: The node 3 already exists in the voters set., raft_id: 2
Nov 01 00:24:56.648 ERRO e: The node 3 already exists in the voters set., raft_id: 3

Maybe there is something wrong in the state machine 🤔️.

Fullstop000 avatar Oct 31 '19 16:10 Fullstop000

I am baffled, I’ve been dancing around this for a day tried both the 0.6.0 crate and git. The error changes slightly if I don’t persist the ConfState but that feels wrong and causes other issues

Licenser avatar Oct 31 '19 16:10 Licenser

After some digging I noticed that logs are not replicated to joining nodes. So joining 2 and the 3 they do share the same snapshot but they don't share the same logs - could that be where the error originates from?

Licenser avatar Nov 05 '19 10:11 Licenser

@Hoverbear PTAL

Fullstop000 avatar Nov 06 '19 07:11 Fullstop000

Hmm...

This is typically caused by one of these:

https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/progress/progress_set.rs#L301-L341

You can see when a node is removed from the ProgressSet we remove it from the voter set:

https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/progress/progress_set.rs#L347-L355

That's called by remove_node here:

https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/raft.rs#L2104-L2123

Which is called via apply_conf_change here:

https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/src/raw_node.rs#L324-L340

I see you're updating it:

https://github.com/wayfair-incubator/uring/blob/a0a5ffa2990e2e8c91f9849cd7eefdd082501543/src/raft_node.rs#L333-L345

I need to play with this some :)

Hoverbear avatar Nov 06 '19 18:11 Hoverbear

Thanks taking the time to provide all the context!

The assumption made is that add and remove node via a conf change event is for orderly growing or shrinking the raft group not intermittent failure. So currently there is no code path in which we send a remove-node conf change event.

So basically in the code we call add_node only when a new node joining the cluster (on the simplest possible algorithm of pre-defined/shared node id's and if they're known or not), we didn't got to removing nodes again yet since we're still battling the restart case.

Licenser avatar Nov 06 '19 18:11 Licenser

So some preliminary tests:

  • Online 1, Election (L1), Online 2 & 3, Pause, Down 2 or 3, no issues. Rejoin is fine.
  • Online 1, Election (L1), Online 2 & 3, Pause, Down 1, panic.

trace:

Nov 06 10:53:55.571 DEBG Sending from 2 to 1, msg: msg_type: MsgHeartbeatResponse to: 1, to: 1, from: 2, raft_id: 2
[WS Onramp] Connection terminated.
system stopped
Error: Failed to connect to host: Connection refused (os error 111)
Nov 06 10:53:55.745 WARN down(local), id: 1
Nov 06 10:53:57.321 INFO starting a new election, term: 2, raft_id: 2
Nov 06 10:53:57.321 INFO became pre-candidate at term 2, term: 2, raft_id: 2
send raft message to 1 fail, let Raft retry it
Nov 06 10:53:57.322 INFO 2 received message from 2, term: 2, msg: MsgRequestPreVote, from: 2, id: 2, raft_id: 2
Nov 06 10:53:57.323 INFO [logterm: 2, index: 5] sent request to 1, msg: MsgRequestPreVote, term: 2, id: 1, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.324 DEBG Sending from 2 to 1, msg: msg_type: MsgRequestPreVote to: 1 term: 3 log_term: 2 index: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.325 INFO [logterm: 2, index: 5] sent request to 3, msg: MsgRequestPreVote, term: 2, id: 3, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.326 DEBG Sending from 2 to 3, msg: msg_type: MsgRequestPreVote to: 3 term: 3 log_term: 2 index: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.327 DEBG State transition, next-state: PreCandidate, last-state: Follower
Nov 06 10:53:57.332 INFO received from 3, term: 2, msg type: MsgRequestPreVoteResponse, from: 3, raft_id: 2
Nov 06 10:53:57.333 DEBG reset election timeout 18 -> 16 at 0, election_elapsed: 0, timeout: 16, prev_timeout: 18, raft_id: 2
Nov 06 10:53:57.333 INFO became candidate at term 3, term: 3, raft_idsend raft message to 1 fail, let Raft retry it
: 2
Nov 06 10:53:57.334 INFO 2 received message from 2, term: 3, msg: MsgRequestVote, from: 2, id: 2, raft_id: 2
Nov 06 10:53:57.335 INFO [logterm: 2, index: 5] sent request to 1, msg: MsgRequestVote, term: 3, id: 1, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.336 DEBG Sending from 2 to 1, msg: msg_type: MsgRequestVote to: 1 term: 3 log_term: 2 index: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.337 INFO [logterm: 2, index: 5] sent request to 3, msg: MsgRequestVote, term: 3, id: 3, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.338 DEBG Sending from 2 to 3, msg: msg_type: MsgRequestVote to: 3 term: 3 log_term: 2 index: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.339 DEBG State transition, next-state: Candidate, last-state: PreCandidate
Nov 06 10:53:57.344 INFO received from 3, term: 3, msg type: MsgRequestVoteResponse, from: 3, raft_id: 2
Nov 06 10:53:57.345 DEBG reset election timeout 16 -> 19 at 0, election_elapsed: 0, timeout: 19, prev_timeout: 16, raft_id: 2
Nov 06 10:53:57.345 INFO became leader at term 3, [src/raft_node.rs:434] raft_group.raft.raft_log.last_index() + 1 = 7
term: 3, raft_id: 2
Nov 06 10:53:57.345 DEBG Sending from 2 to 1, msg: msg_type: MsgAppend to: 1 log_term: 2 index: 5 entries {term: 3 index: 6} commit: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.346 DEBG Sending from 2 to 3, msg: msg_type: MsgAppend to: 3 log_term: 2 index: 5 entries {term: 3 index: 6} commit: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.347 DEBG State transition, next-state: Leader, last-state: Candidate
send raft message to 1 fail, let Raft retry it
Nov 06 10:53:57.409 DEBG committing index 6, index: 6, raft_id: 2
Nov 06 10:53:57.409 DEBG Sending from 2 to 3, msg: msg_type: MsgAppend to: 3 log_term: 3 index: 6 entries {entry_type: EntryConfChange term: 3 index: 7 data: "\030\003"} commit: 6, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.458 DEBG committing index 7, index: 7, raft_id: 2
Nov 06 10:53:57.458 DEBG Sending from 2 to 3[src/raft_node.rs:343] &cc = node_id: 3

, msg: msg_type: MsgAppend to: 3 log_term: 3 index: 7 commit: 7, tothread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Exists(3, "voters")', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
: 3, from: 2, raft_id: 2
Nov 06 10:53:57.459 DEBG adding node (learner: false) with ID 3 to peers., id: 3, learner: false, raft_id: 2
Nov 06 10:53:57.460 DEBG Inserting voter with id 3, id: 3, raft_id: 2
Nov 06 10:53:57.461 ERRO e: The node 3 already exists in the voters set., raft_id: 2
thread 'actix-rt:worker:1' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1165:5
thread 'actix-rt:worker:1' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1165:5
stack backtrace:
   0:     0x56294b85ced4 - backtrace::backtrace::libunwind::trace::hf9ad636648efb226
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x56294b85ced4 - backtrace::backtrace::trace_unsynchronized::hed853f204024d151
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:     0x56294b85ced4 - std::sys_common::backtrace::_print_fmt::h540be03650997d7f
                               at src/libstd/sys_common/backtrace.rs:77
   3:     0x56294b85ced4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h859895b5f65feccf
                               at src/libstd/sys_common/backtrace.rs:61
   4:     0x56294b8805ac - core::fmt::write::h55a2a37a9ae04dad
                               at src/libcore/fmt/mod.rs:1028
   5:     0x56294b857d87 - std::io::Write::write_fmt::h55b418b8331f602e
                               at src/libstd/io/mod.rs:1412
   6:     0x56294b85f44e - std::sys_common::backtrace::_print::h01cf1aa97e3bff6e
                               at src/libstd/sys_common/backtrace.rs:65
   7:     0x56294b85f44e - std::sys_common::backtrace::print::h992feaba39b9b6be
                               at src/libstd/sys_common/backtrace.rs:50
   8:     0x56294b85f44e - std::panicking::default_hook::{{closure}}::hf519e5920434d96d
                               at src/libstd/panicking.rs:188
   9:     0x56294b85f141 - std::panicking::default_hook::ha5560763526d8f2a
                               at src/libstd/panicking.rs:205
  10:     0x56294b85fb4b - std::panicking::rust_panic_with_hook::h23e3b045936d3e0a
                               at src/libstd/panicking.rs:464
  11:     0x56294b85f6ee - std::panicking::continue_panic_fmt::h22cad8a6b4a9a888
                               at src/libstd/panicking.rs:373
  12:     0x56294b85f5d6 - rust_begin_unwind
                               at src/libstd/panicking.rs:302
  13:     0x56294b87c4ce - core::panicking::panic_fmt::h9f376b306ccc647c
                               at src/libcore/panicking.rs:139
  14:     0x56294b87c5c7 - core::result::unwrap_failed::ha857e2e239846901
                               at src/libcore/result.rs:1165
  15:     0x56294a38d6d4 - core::result::Result<T,E>::unwrap::h2b62ba69aa6e5d84
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/result.rs:933
  16:     0x56294a3e2a1a - <uring::UrSocket as actix::actor::Actor>::stopped::h39bc887b77fa9f3d
                               at src/main.rs:129
  17:     0x56294a31dec8 - <actix::contextimpl::ContextFut<A,C> as futures::future::Future>::poll::h20f0330cfe0a782f
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-0.8.3/src/contextimpl.rs:438
  18:     0x56294a31c705 - <actix::contextimpl::ContextFut<A,C> as core::ops::drop::Drop>::drop::h7575ff147dd421c0
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-0.8.3/src/contextimpl.rs:220
  19:     0x56294a36ad01 - core::ptr::real_drop_in_place::h0f7834c8d060f1e5
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  20:     0x56294a36b521 - core::ptr::real_drop_in_place::h15157f2273174a4a
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  21:     0x56294a382eb1 - core::ptr::real_drop_in_place::hfd3912dcfbeb53e5
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  22:     0x56294a379348 - core::ptr::real_drop_in_place::h9bdb1342952ae961
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  23:     0x56294a3715bb - core::ptr::real_drop_in_place::h4cd96d3c8d8904ab
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  24:     0x56294a377cd5 - core::ptr::real_drop_in_place::h89d944f4395f2954
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  25:     0x56294a380383 - core::ptr::real_drop_in_place::he465ebf8b5d81fc4
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  26:     0x56294a36e565 - core::ptr::real_drop_in_place::h303c9666062b1852
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  27:     0x56294a373439 - core::ptr::real_drop_in_place::h639d6c80397d3994
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  28:     0x56294a37c476 - core::ptr::real_drop_in_place::hbd286707c9379e36
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  29:     0x56294a369842 - core::ptr::real_drop_in_place::h050601c6fa9adbb3
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  30:     0x56294a382e91 - core::ptr::real_drop_in_place::hfd1b973e3f3405b7
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  31:     0x56294a37d47f - core::ptr::real_drop_in_place::hc6257ba62dd3ee4e
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  32:     0x56294a3800a1 - core::ptr::real_drop_in_place::he21e27ce352065f6
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  33:     0x56294a37ca48 - core::ptr::real_drop_in_place::hbfdd5e57c1ea6a8a
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  34:     0x56294a36fd81 - core::ptr::real_drop_in_place::h3e1b516237dc7899
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  35:     0x56294a36f1f4 - core::ptr::real_drop_in_place::h384c3f60fbd1cfd3
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  36:     0x56294aeb48b8 - core::ptr::real_drop_in_place::h25dd0baed7f0f375
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  37:     0x56294aeb4fe1 - core::ptr::real_drop_in_place::hdd6aac9bbf1b25da
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  38:     0x56294aeb4851 - core::ptr::real_drop_in_place::h1bc710e61ac51dd8
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  39:     0x56294aeb488f - core::ptr::real_drop_in_place::h20559ea5933991c4
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  40:     0x56294aeb6d47 - core::mem::drop::h7804235b62a558b2
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/mem/mod.rs:704
  41:     0x56294ae10035 - tokio_current_thread::scheduler::release_node::h72a3b1970158c2e8
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:382
  42:     0x56294ae0ff4f - <tokio_current_thread::scheduler::Scheduler<U>::tick::Bomb<U> as core::ops::drop::Drop>::drop::{{closure}}::hf313f7599952abe0
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:271
  43:     0x56294ae0cbe4 - tokio_current_thread::Borrow<U>::enter::{{closure}}::{{closure}}::h93152cf040b0cfc6
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:788
  44:     0x56294ae0c593 - tokio_current_thread::CurrentRunner::set_spawn::hb299958eebe1449b
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:825
  45:     0x56294ae0ca85 - tokio_current_thread::Borrow<U>::enter::{{closure}}::h8bb87fc431b3369e
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:788
  46:     0x56294adf4b1a - std::thread::local::LocalKey<T>::try_with::h4b4da45cb0953780
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
  47:     0x56294adf3568 - std::thread::local::LocalKey<T>::with::hbe6f12a20964b981
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
  48:     0x56294ae0c61b - tokio_current_thread::Borrow<U>::enter::h0d6f514f2853732f
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:786
  49:     0x56294ae0feb7 - <tokio_current_thread::scheduler::Scheduler<U>::tick::Bomb<U> as core::ops::drop::Drop>::drop::h60e904c13e6346e3
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:271
  50:     0x56294ae066a5 - core::ptr::real_drop_in_place::h87d3db269008cc8c
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
  51:     0x56294ae11879 - tokio_current_thread::scheduler::Scheduler<U>::tick::h8ba9f0e91fe9beb0
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:342
  52:     0x56294ae0ce2d - tokio_current_thread::Entered<P>::tick::h015233dbf859aa87
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:612
  53:     0x56294ae0cff0 - tokio_current_thread::Entered<P>::block_on::h5c966374bfc009d6
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:502
  54:     0x56294ae0168e - actix_rt::runtime::Runtime::block_on::{{closure}}::ha255279c9e9e0678
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:128
  55:     0x56294ae01d61 - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::{{closure}}::{{closure}}::h71c41cadaf16c741
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:168
  56:     0x56294ae2851d - tokio_executor::global::with_default::{{closure}}::hf2f1bdd7887fd578
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:209
  57:     0x56294adf6448 - std::thread::local::LocalKey<T>::try_with::heda6599f2047da06
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
  58:     0x56294adf348c - std::thread::local::LocalKey<T>::with::hb8098ea981ebb349
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
  59:     0x56294ae280a2 - tokio_executor::global::with_default::hc1252736b4e426c1
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:178
  60:     0x56294ae01f7e - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::{{closure}}::h9de40e2003eada88
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:166
  61:     0x56294ae19dc2 - tokio_timer::timer::handle::with_default::{{closure}}::h6e3dd5bd04bbe4cc
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/handle.rs:101
  62:     0x56294adf3a28 - std::thread::local::LocalKey<T>::try_with::h04334be1dd5a6501
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
  63:     0x56294adf2c1c - std::thread::local::LocalKey<T>::with::h1110cb2dd7a5e41a
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
  64:     0x56294ae19955 - tokio_timer::timer::handle::with_default::h6e7191e9227a3395
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/handle.rs:84
  65:     0x56294ae020a9 - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::hcb4ffcbdedde3879
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:159
  66:     0x56294adf0df8 - tokio_timer::clock::clock::with_default::{{closure}}::h4d2baf6a8640417e
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/clock/clock.rs:137
  67:     0x56294adf4844 - std::thread::local::LocalKey<T>::try_with::h44e90ccbce8ba7a0
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
  68:     0x56294adf38ad - std::thread::local::LocalKey<T>::with::hf8efabeded619c12
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
  69:     0x56294adf0a7e - tokio_timer::clock::clock::with_default::h744d246f9ee1c3c7
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/clock/clock.rs:117
  70:     0x56294ae020fd - actix_rt::runtime::Runtime::enter::{{closure}}::h25beeae16b0aab2b
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:158
  71:     0x56294ae1a53f - tokio_reactor::with_default::{{closure}}::h047eea81de149d26
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:237
  72:     0x56294adf5544 - std::thread::local::LocalKey<T>::try_with::h983b47c211a53258
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
  73:     0x56294adf2b9d - std::thread::local::LocalKey<T>::with::h0ddb474724c356fd
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
  74:     0x56294ae1a318 - tokio_reactor::with_default::hedeceb1f1204f41e
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:217
  75:     0x56294ae0196c - actix_rt::runtime::Runtime::enter::he0a8a92e295c18e4
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:157
  76:     0x56294ae01478 - actix_rt::runtime::Runtime::block_on::h5a965072a66504f6
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:126
  77:     0x56294ae26303 - actix_rt::arbiter::Arbiter::new::{{closure}}::h03e49a8780bdb49d
                               at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/arbiter.rs:113
  78:     0x56294ae1f0f5 - std::sys_common::backtrace::__rust_begin_short_backtrace::h11b303417e3a8564
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/sys_common/backtrace.rs:129
  79:     0x56294ae040c1 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::he7652596c7722712
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/mod.rs:469
  80:     0x56294adf6664 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h8613466756385bec
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panic.rs:317
  81:     0x56294adf1911 - std::panicking::try::do_call::h711ea6f48f6c4718
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panicking.rs:287
  82:     0x56294b86414a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  83:     0x56294adf1768 - std::panicking::try::h7cbe878e6ead05a2
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panicking.rs:265
  84:     0x56294adf69d6 - std::panic::catch_unwind::hf74332f300029515
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panic.rs:396
  85:     0x56294ae03eb2 - std::thread::Builder::spawn_unchecked::{{closure}}::h7970385913689ac2
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/mod.rs:468
  86:     0x56294ae04434 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf90741fee7c7944d
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ops/function.rs:227
  87:     0x56294b85100f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hab0345739862a0e3
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/liballoc/boxed.rs:942
  88:     0x56294b863210 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h97269b0dcf5f4700
                               at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/liballoc/boxed.rs:942
  89:     0x56294b863210 - std::sys_common::thread::start_thread::h6d1cb01c00583451
                               at src/libstd/sys_common/thread.rs:13
  90:     0x56294b863210 - std::sys::unix::thread::Thread::new::thread_start::h2c73c5e5c4fa31f9
                               at src/libstd/sys/unix/thread.rs:79
  91:     0x7f4cd68f4669 - start_thread
  92:     0x7f4cd6800323 - clone
  93:                0x0 - <unknown>
thread panicked while panicking. aborting.
Illegal instruction

Hoverbear avatar Nov 06 '19 18:11 Hoverbear

Seems this bit is what's important:

Nov 06 10:53:57.459 DEBG adding node (learner: false) with ID 3 to peers., id: 3, learner: false, raft_id: 2
Nov 06 10:53:57.460 DEBG Inserting voter with id 3, id: 3, raft_id: 2
Nov 06 10:53:57.461 ERRO e: The node 3 already exists in the voters set., raft_id: 2

Hoverbear avatar Nov 06 '19 18:11 Hoverbear

Ja, which is odd, the only time we fire a AddNode event if the raft state says it's an unknown node: https://github.com/wayfair-incubator/uring/blob/master/src/main.rs#L557 since node 1 did see 2 and 3 before and persists the state to disk it should never get there - I don't see any add-node messages fire either.

Licenser avatar Nov 06 '19 19:11 Licenser

Hmm, so ConfChange should normally come from Raft log entries. This is the replicated log, so 'injecting' an entry is inherently unsafe. I suggest you remove that code. (https://github.com/wayfair-incubator/uring/blob/a0a5ffa2990e2e8c91f9849cd7eefdd082501543/src/main.rs#L557-L584) :)

If you want to add a node you can try in a way like:

https://github.com/tikv/raft-rs/blob/d34f6936e737563cc930bb8f03970a18f0cba7c9/examples/five_mem_node/main.rs#L405-L419

This is the correct way to add something to the log.

But I'm not sure you want to try to do that in this situation. :) Just commenting out that code seems to remove the panic...

It does seem like the remaining followers fail to correctly hold a new election though. I'm going to let you take a look and let me know if this helps. :)

Hoverbear avatar Nov 06 '19 19:11 Hoverbear

Also: I'm sorry the docs aren't so clear. :( We are trying to improve it and would definitely love any concrete issues/prs with improvements.

Hoverbear avatar Nov 06 '19 19:11 Hoverbear

No reason to be sorry, I think everyone who has ever worked on a project of any kind of complexity can understand, it’s hard to document what you know since everything is obvious at that point ;)

To the code, we started with a direct copy but had to add the second change even or the other nodes didn’t learn about the first node. But the method is even after that still the same - it gets published over a proposal and agreed on not injected in the logs. But I’ll try to remove / replace the code later perhaps the issue it was added for originally was a different one. That said it’s a bit different from the memory nodes since they all exist in their own world and are networked do we can’t just join 5 nudes at boot time, they got to grow as they connect

Licenser avatar Nov 06 '19 19:11 Licenser

You should definitely be able to grow/shrink your Raft cluster online. :)

Hoverbear avatar Nov 06 '19 19:11 Hoverbear

So I tried the suggestion of commenting out the code, but that just doesn't crash because it never creates a group. Node 1 knows itself, as a voter, but neither node 2 or three know of any other node

Node1

Nov 06 21:26:57.192 DEBG NODE STATE, remote-mailboxes: [2, 3], local-mailboxes: [], randomized-election-timeout: 15, election-elapsed: 4, pass-election-timeout: false, promotable: true, voters: {1}, votes: {}, vote: 1, last-index: 2, first-index: 2, term: 2, leader-id: 1, role: Leader, node-id: 1

Node2

Nov 06 21:27:13.903 DEBG NODE STATE, remote-mailboxes: [3], local-mailboxes: [1], randomized-election-timeout: 12, election-elapsed: 565, pass-election-timeout: true, promotable: false, voters: {}, votes: {}, vote: 0, last-index: 0, first-index: 1, term: 0, leader-id: 0, role: Follower, node-id: 2

Node3

Nov 06 21:26:24.032 DEBG NODE STATE, remote-mailboxes: [], local-mailboxes: [2, 1], randomized-election-timeout: 14, election-elapsed: 93, pass-election-timeout: true, promotable: false, voters: {}, votes: {}, vote: 0, last-index: 0, first-index: 1, term: 0, leader-id: 0, role: Follower, node-id: 3

I'll try to make an explicit 'add node' action that isn't connected to the join on the clsuter perhaps that works better

Licenser avatar Nov 06 '19 20:11 Licenser

So I changed how chance events are send and made it explicit. (https://github.com/wayfair-incubator/uring/commit/a791b33008faa690bc6d643b9cf223db9852bbb1)

That shifts the problem slightly, it now goes to (after restarring node 1 (initial leade), and node 3 (new leader) - on node 1):

Nov 06 21:43:09.803 WARN down(local), id: 3
thread '<unnamed>' panicked at 'slice[6,6] out of bound[2,4], raft_id: 1', /Users/heinz/.cargo/git/checkouts/raft-rs-42b8049ef2e3af07/2ce67a0/src/raft_log.rs:420:13

which is a bit odd since none of the nodes has 6 as a index.

1 (before it crashed)

Nov 06 21:43:01.487 DEBG NODE STATE, remote-mailboxes: [], local-mailboxes: [3, 2], randomized-election-timeout: 10, election-elapsed: 2, pass-election-timeout: false, promotable: true, voters: {1, 2, 3}, votes: {}, vote: 0, last-index: 4, first-index: 2, term: 3, leader-id: 3, role: Follower, node-id: 1

2

Nov 06 21:44:58.602 DEBG NODE STATE, remote-mailboxes: [], local-mailboxes: [], randomized-election-timeout: 10, election-elapsed: 6, pass-election-timeout: false, promotable: true, voters: {1, 2, 3}, votes: {2: true}, vote: 3, last-index: 5, first-index: 4, term: 3, leader-id: 0, role: PreCandidate, node-id: 2

3

Nov 06 21:43:02.790 DEBG NODE STATE, remote-mailboxes: [1], local-mailboxes: [2], randomized-election-timeout: 12, election-elapsed: 3, pass-election-timeout: false, promotable: true, voters: {1, 2, 3}, votes: {}, vote: 3, last-index: 5, first-index: 5, term: 3, leader-id: 3, role: Leader, node-id: 3

Licenser avatar Nov 06 '19 20:11 Licenser

Hmmm, that's a bit of a puzzle isn''t it!

I see that index 5 is done:

Nov 07 12:17:02.582 INFO [commit: 5, term: 3] restored snapshot [index: 5, term: 3], snapshot_term: 3, snapshot_index: 5, commit: 5, term: 3, raft_id: 1

Hoverbear avatar Nov 07 '19 22:11 Hoverbear

Yup, but only on node 2 & 3, node 2 is still at 4 - I suspect 5 was the election of the new leader when 1 was killed so it makes sense that it doesn't have it yet. I'm still unsure where it gets 6 from

Licenser avatar Nov 07 '19 22:11 Licenser

Yup I misread and deleted the comment :-P

Hoverbear avatar Nov 07 '19 22:11 Hoverbear

:D ah sorry, I just saw the mail.

Licenser avatar Nov 07 '19 22:11 Licenser

@Licenser If you don't call curl -X POST 127.0.0.1:8081/node/1 in your reproduction case it seems to work without panicking... At least raft is trying to send the correct messages but it seems to be unable to.

image

Hoverbear avatar Nov 07 '19 22:11 Hoverbear

Thanks! That's an interesting observation. I didn't try that yet - it's a bit late here (close to midnight) but I'll give that a try tomorrow and keep digging.

The failing to send message might be because the node was booted without knowing it's peers that's a easy to fix thing (famous last words) at least easy to work around for the same of getting the raft-rs use case working enough to have a minimal functioning kv store in it by just passing all peers to all nodes with more -p in the command line (obviously not a production grade solution 😂)

Licenser avatar Nov 07 '19 22:11 Licenser

Yeah, let's get this working! :) Would love to see more raft-rs derivatives and this is informing how we can better document things.

(Un)Fortunately we're updating our membership change mechanisms (hello Joint Consensus) hopefully soon once we resolve some lingering safety issues with the etcd folks.

If you need help learning to use Raft @luciofranco made a simple KV here: https://github.com/LucioFranco/kv, and the TiKV Raftstore is a maximal example: https://github.com/tikv/tikv/tree/master/src/raftstore

Hoverbear avatar Nov 07 '19 22:11 Hoverbear

Learning raft isn't really my problem I'm not unfamiliar with it. Working through the assumptions in raft-rs is where I am stuck :(.

I looked at @LucioFranco and as far as I can tell it suffers from the same problems.

bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set."

I gave not adding node 1 a try and it ends with the same issue of trying to access index 6, it just takes one more kill to get there, (start1, start2, start3, join2, join3, kill1, start1, kill2(was leader), join2 => tries to access index 6)

Licenser avatar Nov 08 '19 08:11 Licenser

So I've come full circle with this. The original 'node already known problem' back where I am. The solution to that was to not load the ConfState from disk when restarting a node. That on the other hand causes nodes to forget their peers so restarting 1 it only comes up with voter 2, 3 (but not itself), restarting 2, and 2 only has 3 as voter but neither 1 or itself.

More general it seems that nodes only know about nodes added after them, so node 1 after a restart knows about 2/3, node 2 after a restart knows about 3, node 3 after a restart knows none.

Licenser avatar Nov 08 '19 15:11 Licenser

bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set."

At the join 1 step, why are you doing this? 1 was already added? Once a node is added to the set, it should only be removed via an explicit Remove. As you know from Raft the leader (2 or 3) will continually try to reconnect to the lost node (1), once they manage to hear back, the node is recovered. It never needs to be removed unless you've actually explicitly given up on that node.

Here's how the process should go:

  • Create & start Raft node 1, it becomes leader.
  • Create Raft node 2 & 3 and idle them.
  • Inform Leader (1) of node 2 & 3 through an AddNode proposal. Let it run and hold an election.
  • The cluster is lively.
  • Kill 1
  • 2/3 hold an election, one wins.
  • 1 recovers, begins listening and ticking it's election timeout.
  • Leader (2/3) either contacts it or hears from it and captures it. (the other follower would just redirect it to the current leader)
  • The cluster is lively again.

So you'd do remove/add when you were decommisioning a host, and if you were just taking a host down for maintenance or a reboot, you can just leave it in the membership pool. :)

Hoverbear avatar Nov 08 '19 16:11 Hoverbear

bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set."

At the join 1 step, why are you doing this? 1 was already added? Once a node is added to the set, it should only be removed via an explicit Remove. As you know from Raft the leader (2 or 3) will continually try to reconnect to the lost node (1), once they manage to hear back, the node is recovered. It never needs to be removed unless you've actually explicitly given up on that node.

kv needs to be called with either join or bootstrap there is no other way to start it, I'd rather not call either but that isn't an option.

Here's how the process should go:

  • Create & start Raft node 1, it becomes leader.
  • Create Raft node 2 & 3 and idle them.
  • Inform Leader (1) of node 2 & 3 through an AddNode proposal. Let it run and hold an election.
  • The cluster is lively.
  • Kill 1
  • 2/3 hold an election, one wins.
  • 1 recovers, begins listening and ticking it's election timeout.

This is where it gets problematic, after recovery the nodes either run into a "node already known" problem when the ConfState is persisted to disk and re-loaded or it forgets about the nodes. That's where I'm stuck at the moment.

  • Leader (2/3) either contacts it or hears from it and captures it. (the other follower would just redirect it to the current leader)
  • The cluster is lively again.

So you'd do remove/add when you were decommisioning a host, and if you were just taking a host down for maintenance or a reboot, you can just leave it in the membership pool. :)

I know, there are no calls to add outside of the PUT node/X and no calls to remove at all. That's the mystery.

Licenser avatar Nov 08 '19 17:11 Licenser

This is where it gets problematic, after recovery the nodes either run into a "node already known" problem when the ConfState is persisted to disk and re-loaded or it forgets about the nodes. That's where I'm stuck at the moment.

So when 1 recovers and it loads from the hard state, it should already know about 1,2,3 already no? Why is it adding nodes?

Hoverbear avatar Nov 08 '19 17:11 Hoverbear