dragonboat icon indicating copy to clipboard operation
dragonboat copied to clipboard

Add Auto Promote Support for Observer

Open abbccdda opened this issue 4 years ago • 10 comments

These days when user tries to add a new raft node and avoid unnecessary downtime, the first step is to add the new node as observer, wait it to catch up and then promote it as a follower manually. This whole process could be automated in Dragonboat.

The detailed motivation is described in Raft paper 4.2.1, but the implementation takes a different approach. In the paper, leader is responsible for tracking the observer progress by monitoring the new entry replication speed, and promote the observer when one replication round is less than election timeout, which suggests the gap is sufficiently small. This idea adds burden to leader logic complexity and overhead for leadership transfer.

Instead, Dragonboat chooses a self-nomination approach through observer initiates readIndex call to make sure it is sufficiently catching up with the leader. If we see consecutive rounds of readIndex success within election timeout, the observer will trigger a config change to add itself as a follower on next round. This is a de-centralized design which saves the dependency on elected leader to decide the role, thus easier to reason about.

abbccdda avatar Nov 22 '19 09:11 abbccdda

Codecov Report

Merging #109 into master will increase coverage by 0.75%. The diff coverage is 95.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
+ Coverage   80.86%   81.61%   +0.75%     
==========================================
  Files          12       13       +1     
  Lines        3589     3708     +119     
==========================================
+ Hits         2902     3026     +124     
+ Misses        450      446       -4     
+ Partials      237      236       -1
Impacted Files Coverage Δ
requests.go 88.87% <ø> (+0.94%) :arrow_up:
nodehost.go 77.48% <100%> (+0.1%) :arrow_up:
node.go 80.19% <88.24%> (+0.19%) :arrow_up:
promotion_manager.go 97.06% <97.06%> (ø)
execengine.go 83.25% <0%> (+1.05%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6487afa...4ba8ed4. Read the comment docs.

codecov-io avatar Nov 22 '19 09:11 codecov-io

Thanks for the PR, good to see you tried to implement the idea I mentioned during the WeChat discussion. I think -

  1. execEngine should not be changed just to get some periodic tasks invoked. there are existing facilities for that kind of stuff, e.g. NodeHost.tickWorkerMain

  2. the ICompletionHandler, as described in its godoc, is to be used by the language binding only. initiating a new proposal in Notify() will cause deadlock

  3. the current implementation keeps polling without any back off when the node is not ready to be promoted, this is going to cause excessive traffic/leader overhead when there are large number of observer nodes

  4. will be useful to have some integration tests to show that auto promotion works on the NodeHost level - please have a quick look at other observer required tests in nodehost_test.go for reference

  5. checking requestStartTime.Add(time.Duration(m.electionTimeout)).After(resultReceivedTime) is unnecessary

  6. using wall clock time is actually prohibited. You were probably forced to use wall clock time as there is no accessible internal tick based API for the read index and membership change operations in node.go/request.go. I will address this item 6 in a separate PR.

lni avatar Nov 24 '19 13:11 lni

Thanks for the comments. For integration test I was planning to address it after you took an initial look at current logic. For #5 my question was if we don't check the time used for reacting the readIndex, are we still confident that the lag between observer and leader is trivial?

abbccdda avatar Nov 25 '19 09:11 abbccdda

Oh I think I got it for now, as we have already defined readIndex timeout, so as long as it's not timing out, we should be assuming it's successful. As for #6 we indeed need to have a logical measurement on the lag if possible for retry lock to be vigilant.

abbccdda avatar Nov 25 '19 10:11 abbccdda

I think there also need to be a mechanism to get user notified when the observer is auto promoted to be a regular node, e.g. a callback function specified to the config object. The next question is whether that AutoPromote bool flag is still necessary - maybe specifying an AutoPromotionCompleted callback function to the config object implies the auto promotion to be enabled.

lni avatar Nov 27 '19 06:11 lni

That also sounds like a valid solution. We could discuss further on Wechat to see whether that's necessary.

abbccdda avatar Nov 30 '19 13:11 abbccdda

the above mentioned item 6 has been addressed in d72a5934d60810628731a405226e6db815d5239f

lni avatar Dec 03 '19 11:12 lni

Thanks! Will rebase the change. However, I'm not sure we still need it.

abbccdda avatar Dec 05 '19 16:12 abbccdda

In https://github.com/etcd-io/etcd/blob/9f81002a11b55c5f53b76b3c00b1e83529d0eaab/etcdserver/api/membership/cluster.go#L286, there is a flag in config change to indicate the request is for promotion. Do we also want to add such a safety check?

abbccdda avatar Jan 13 '20 06:01 abbccdda

Codecov Report

Merging #109 into master will decrease coverage by 0.26%. The diff coverage is 95.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
- Coverage   81.86%   81.61%   -0.25%     
==========================================
  Files          13       13              
  Lines        4156     3708     -448     
==========================================
- Hits         3402     3026     -376     
+ Misses        486      446      -40     
+ Partials      268      236      -32     
Impacted Files Coverage Δ
requests.go 88.87% <ø> (+2.74%) :arrow_up:
node.go 80.19% <88.24%> (-0.83%) :arrow_down:
promotion_manager.go 97.06% <97.06%> (ø)
nodehost.go 77.48% <100.00%> (-1.34%) :arrow_down:
event.go 37.10% <0.00%> (-39.64%) :arrow_down:
snapshotter.go 74.68% <0.00%> (-0.32%) :arrow_down:
vfs.go
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 449368c...7af8eb2. Read the comment docs.

codecov-commenter avatar Jun 13 '20 16:06 codecov-commenter