old-raft-rs icon indicating copy to clipboard operation
old-raft-rs copied to clipboard

Dynamic Membership Changes

Open Hoverbear opened this issue 9 years ago • 6 comments

Support dynamic membership changes as specified in the dissertation. (See README.md) This is a simpler choice.

Notes from the dissertation:

Safety

  • [ ] When a server recieves a request it should append the new configuration to it's log and replicate the entry normally.
  • [ ] The new configuration takes place on each server as soon as it is added. It does not wait for a commit.
  • [ ] Servers always use the latest configuration from their logs.
  • [ ] Leader should only respond to the client when the majority have commited the configuration change.
  • [ ] Configuration changes can only happen one at a time. This means that another configuration change should only happen if the last has been commited.
  • [ ] Handles case where leadership changes and a configuration change gets rolled back. (Should fall back to previous configuration)
  • [ ] A server should accept AppendEntries requests from a leader that is not part of the server's latest configuration. This is because it may not yet have the entry where the server is added.
  • [ ] Same as above for RequestVote, this may occasionally be needed to keep the cluster available.

Availability

  • [ ] Availability is hampered by having not up to date logs. An additional "catch up" phase should be used where the leader replicates entries to it but the server is not counted as a voting member.
  • [ ] It's noted that this non-voting characteristic might be useful in some implementations.
  • [ ] Leader must determine when the new server is sufficiently caught up to continue. "Round based" catchup as detailed in page 38 of dissertation in the last paragraph is a good way of doing this.
  • [ ] Temporary unavailability of the cluster due to changes should be less than a heartbeat timeout.
  • [ ] Leader must abort the change if it's too slow or unavailable. (This risks disrupting the cluster)
  • [ ] Include test for trying to add a unavailable SocketAddr to test for failure.
  • [ ] When adding a new server it can take some time before the next_index counter finally drops to 1 and the log starts replicating. It is suggested that the Followers include the length of their logs in the AppendEntries response this way the Leader can cap it.

Removing the Current Leader

  • [ ] A Leader Transfer Extension is described as the most straightforward approach and may have other useful applications.

Distruptive Servers

  • [ ] It's possible for a server that is removed from a cluster to disrupt it by continuing to trigger elections, resulting in poor availability. It is suggested that the RequestVote RPC is modified such that:
    • If a server recieves a RequestVote request within the minimum election timeout of hearing from a current leader it does not update it's term or grant a vote. Dropping, replying invalid, or delaying the response is fine.
  • [ ] This may have conflicts with the Leader Transfer Extension. Instead, a special flag should be used on RequestVote requests under such a condition.

Hoverbear avatar Jul 31 '15 22:07 Hoverbear

Note that membership changes are simpler in the dissertation, but don't forget about this minor gotcha: https://groups.google.com/forum/#!topic/raft-dev/t4xj6dJTP6E

ongardie avatar Aug 01 '15 20:08 ongardie

@ongardie Would you suggest using one (the one in the paper) over the other (the one in your dissertation?)

Hoverbear avatar Aug 03 '15 18:08 Hoverbear

I'd suggest the single-server approach (dissertation), unless you have a good reason otherwise. (LogCabin still uses the joint consensus approach but just because I never went back to update the code.)

ongardie avatar Aug 03 '15 19:08 ongardie

@ongardie I was reviewing this all today (made some notes above) and was wondering if you've found other applications for the Leader Transfer extension? The other suggested methods of removing the current leader seem rather complicated.

Hoverbear avatar Aug 04 '15 18:08 Hoverbear

Nice summary, @hoverbear, but don't forget to add the fix for the bug I linked to in my first comment!

I don't think I've learned more uses for leadership transfer in the last year since my dissertation was published, so my thoughts in 3.10 are still current. Overall, I think it'd be cleaner and more useful to implement the leadership transfer approach and then use it when removing the cluster leader. But I also have to admit I still have never implemented leadership transfer myself (maybe others on raft-dev have?).

ongardie avatar Aug 04 '15 21:08 ongardie

I think it's also worth outlining adding a member from the perspective of the new member. A Raft instance must respond to anyone trying to talk to it, and there must be a way to "boot" it without it turning into a one-group node. Instead it should just sit there and wait for incoming messages, in case it is added to an existing group. This "talking to strangers" is new with config changes and at the heart of why they're probably the most complex and error-prone part of Raft.

tbg avatar Aug 15 '15 15:08 tbg