dragonboat
dragonboat copied to clipboard
Add Auto Promote Support for Observer
These days when user tries to add a new raft node and avoid unnecessary downtime, the first step is to add the new node as observer, wait it to catch up and then promote it as a follower manually. This whole process could be automated in Dragonboat.
The detailed motivation is described in Raft paper 4.2.1, but the implementation takes a different approach. In the paper, leader is responsible for tracking the observer progress by monitoring the new entry replication speed, and promote the observer when one replication round is less than election timeout, which suggests the gap is sufficiently small. This idea adds burden to leader logic complexity and overhead for leadership transfer.
Instead, Dragonboat chooses a self-nomination approach through observer initiates readIndex call to make sure it is sufficiently catching up with the leader. If we see consecutive rounds of readIndex success within election timeout, the observer will trigger a config change to add itself as a follower on next round. This is a de-centralized design which saves the dependency on elected leader to decide the role, thus easier to reason about.
Codecov Report
Merging #109 into master will increase coverage by
0.75%
. The diff coverage is95.94%
.
@@ Coverage Diff @@
## master #109 +/- ##
==========================================
+ Coverage 80.86% 81.61% +0.75%
==========================================
Files 12 13 +1
Lines 3589 3708 +119
==========================================
+ Hits 2902 3026 +124
+ Misses 450 446 -4
+ Partials 237 236 -1
Impacted Files | Coverage Δ | |
---|---|---|
requests.go | 88.87% <ø> (+0.94%) |
:arrow_up: |
nodehost.go | 77.48% <100%> (+0.1%) |
:arrow_up: |
node.go | 80.19% <88.24%> (+0.19%) |
:arrow_up: |
promotion_manager.go | 97.06% <97.06%> (ø) |
|
execengine.go | 83.25% <0%> (+1.05%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6487afa...4ba8ed4. Read the comment docs.
Thanks for the PR, good to see you tried to implement the idea I mentioned during the WeChat discussion. I think -
-
execEngine should not be changed just to get some periodic tasks invoked. there are existing facilities for that kind of stuff, e.g. NodeHost.tickWorkerMain
-
the ICompletionHandler, as described in its godoc, is to be used by the language binding only. initiating a new proposal in Notify() will cause deadlock
-
the current implementation keeps polling without any back off when the node is not ready to be promoted, this is going to cause excessive traffic/leader overhead when there are large number of observer nodes
-
will be useful to have some integration tests to show that auto promotion works on the NodeHost level - please have a quick look at other observer required tests in nodehost_test.go for reference
-
checking requestStartTime.Add(time.Duration(m.electionTimeout)).After(resultReceivedTime) is unnecessary
-
using wall clock time is actually prohibited. You were probably forced to use wall clock time as there is no accessible internal tick based API for the read index and membership change operations in node.go/request.go. I will address this item 6 in a separate PR.
Thanks for the comments. For integration test I was planning to address it after you took an initial look at current logic. For #5 my question was if we don't check the time used for reacting the readIndex, are we still confident that the lag between observer and leader is trivial?
Oh I think I got it for now, as we have already defined readIndex timeout, so as long as it's not timing out, we should be assuming it's successful. As for #6 we indeed need to have a logical measurement on the lag if possible for retry lock to be vigilant.
I think there also need to be a mechanism to get user notified when the observer is auto promoted to be a regular node, e.g. a callback function specified to the config object. The next question is whether that AutoPromote bool flag is still necessary - maybe specifying an AutoPromotionCompleted callback function to the config object implies the auto promotion to be enabled.
That also sounds like a valid solution. We could discuss further on Wechat to see whether that's necessary.
the above mentioned item 6 has been addressed in d72a5934d60810628731a405226e6db815d5239f
Thanks! Will rebase the change. However, I'm not sure we still need it.
In https://github.com/etcd-io/etcd/blob/9f81002a11b55c5f53b76b3c00b1e83529d0eaab/etcdserver/api/membership/cluster.go#L286, there is a flag in config change to indicate the request is for promotion. Do we also want to add such a safety check?
Codecov Report
Merging #109 into master will decrease coverage by
0.26%
. The diff coverage is95.94%
.
@@ Coverage Diff @@
## master #109 +/- ##
==========================================
- Coverage 81.86% 81.61% -0.25%
==========================================
Files 13 13
Lines 4156 3708 -448
==========================================
- Hits 3402 3026 -376
+ Misses 486 446 -40
+ Partials 268 236 -32
Impacted Files | Coverage Δ | |
---|---|---|
requests.go | 88.87% <ø> (+2.74%) |
:arrow_up: |
node.go | 80.19% <88.24%> (-0.83%) |
:arrow_down: |
promotion_manager.go | 97.06% <97.06%> (ø) |
|
nodehost.go | 77.48% <100.00%> (-1.34%) |
:arrow_down: |
event.go | 37.10% <0.00%> (-39.64%) |
:arrow_down: |
snapshotter.go | 74.68% <0.00%> (-0.32%) |
:arrow_down: |
vfs.go | ||
... and 2 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 449368c...7af8eb2. Read the comment docs.