old-raft-rs
old-raft-rs copied to clipboard
Dynamic Membership Changes
Support dynamic membership changes as specified in the dissertation. (See README.md
) This is a simpler choice.
Notes from the dissertation:
Safety
- [ ] When a server recieves a request it should append the new configuration to it's log and replicate the entry normally.
- [ ] The new configuration takes place on each server as soon as it is added. It does not wait for a commit.
- [ ] Servers always use the latest configuration from their logs.
- [ ]
Leader
should only respond to the client when the majority have commited the configuration change. - [ ] Configuration changes can only happen one at a time. This means that another configuration change should only happen if the last has been commited.
- [ ] Handles case where leadership changes and a configuration change gets rolled back. (Should fall back to previous configuration)
- [ ] A server should accept
AppendEntries
requests from a leader that is not part of the server's latest configuration. This is because it may not yet have the entry where the server is added. - [ ] Same as above for
RequestVote
, this may occasionally be needed to keep the cluster available.
Availability
- [ ] Availability is hampered by having not up to date logs. An additional "catch up" phase should be used where the leader replicates entries to it but the server is not counted as a voting member.
- [ ] It's noted that this non-voting characteristic might be useful in some implementations.
- [ ]
Leader
must determine when the new server is sufficiently caught up to continue. "Round based" catchup as detailed in page 38 of dissertation in the last paragraph is a good way of doing this. - [ ] Temporary unavailability of the cluster due to changes should be less than a heartbeat timeout.
- [ ]
Leader
must abort the change if it's too slow or unavailable. (This risks disrupting the cluster) - [ ] Include test for trying to add a unavailable
SocketAddr
to test for failure. - [ ] When adding a new server it can take some time before the
next_index
counter finally drops to 1 and the log starts replicating. It is suggested that theFollower
s include the length of their logs in theAppendEntries
response this way theLeader
can cap it.
Removing the Current Leader
- [ ] A Leader Transfer Extension is described as the most straightforward approach and may have other useful applications.
Distruptive Servers
- [ ] It's possible for a server that is removed from a cluster to disrupt it by continuing to trigger elections, resulting in poor availability. It is suggested that the
RequestVote
RPC is modified such that:- If a server recieves a
RequestVote
request within the minimum election timeout of hearing from a current leader it does not update it's term or grant a vote. Dropping, replying invalid, or delaying the response is fine.
- If a server recieves a
- [ ] This may have conflicts with the Leader Transfer Extension. Instead, a special flag should be used on
RequestVote
requests under such a condition.
Note that membership changes are simpler in the dissertation, but don't forget about this minor gotcha: https://groups.google.com/forum/#!topic/raft-dev/t4xj6dJTP6E
@ongardie Would you suggest using one (the one in the paper) over the other (the one in your dissertation?)
I'd suggest the single-server approach (dissertation), unless you have a good reason otherwise. (LogCabin still uses the joint consensus approach but just because I never went back to update the code.)
@ongardie I was reviewing this all today (made some notes above) and was wondering if you've found other applications for the Leader Transfer extension? The other suggested methods of removing the current leader seem rather complicated.
Nice summary, @hoverbear, but don't forget to add the fix for the bug I linked to in my first comment!
I don't think I've learned more uses for leadership transfer in the last year since my dissertation was published, so my thoughts in 3.10 are still current. Overall, I think it'd be cleaner and more useful to implement the leadership transfer approach and then use it when removing the cluster leader. But I also have to admit I still have never implemented leadership transfer myself (maybe others on raft-dev have?).
I think it's also worth outlining adding a member from the perspective of the new member. A Raft instance must respond to anyone trying to talk to it, and there must be a way to "boot" it without it turning into a one-group node. Instead it should just sit there and wait for incoming messages, in case it is added to an existing group. This "talking to strangers" is new with config changes and at the heart of why they're probably the most complex and error-prone part of Raft.