m3
m3 copied to clipboard
Draft: support concurrent replaces within the mirrored placement code
This is very WIP; I'm putting it up for @abliqo and potentially @prateek to look at it, but it's otherwise too messy to really consider (and importantly, mostly untested ;)).
This implements a fix for https://github.com/m3db/m3/issues/2850.
The high level approach:
[Limit] the instances we mark available to those affected by the replace, i.e. the leaving (replaced) instances. This will allow replaces that operate on independent shardsets to proceed concurrently.
What this means in practice is that we:
- Lookup the shard set pair for the node being replaced
- for each node in that shard set, try to mark shards available on it
Marking shards available on a node means:
- for each initializing shard, find the corresponding leaving shard. If past cutoff, remove the leaving shard and mark the initializing shard available
- for each leaving shard, find the corresponding initializing shard. Proceed as above.
One other bit of nuance here:
The mirrored algorithm has special casing around reclaiming shards--any case where a node left the cluster, and then was added back before cutovers/cutoffs have passed. This case also ended up needing a bit of changing here.
The reclaim code goes through the entire placement, and pulls back shards whenever they have sourceID == the readding node. For instance:
nodeA: {id = 1, state = INIT, source = nodeB}
nodeB: {1 LEAVING}
If nodeB is readded to the placement, it will reclaim shard 1.
Previous code for replace relies on the nodes being operated on being the only ones in flux. That is, if you have:
nodeA: {id = 1, state= INIT, source = nodeB}
nodeB: {1 LEAVING}
nodeC: {2 INIT source = nodeD}
nodeD: {2 LEAVING }
it will refuse to reclaim nodeB's shards, even though it theoretically could. I think the current check is overly conservative; a node A should be able to reclaim shards from a node B if:
// The leaving instances were a replacement for the adding instances iff:
// - all shards on the leaving node are initializing and came from the adding node
// - all shards on the adding node are leaving (and implicitly, they should be going to the adding node--I could add a defensive check there too, come to think of it)
// - every leaving node has a unique adding node and every adding node has a unique leaving node (i.e. there is a bijection between them).
Codecov Report
Merging #2852 (7d8682e) into master (cb2d5f7) will decrease coverage by
5.9%. Report is 1063 commits behind head on master. The diff coverage is56.2%.
Additional details and impacted files
@@ Coverage Diff @@
## master #2852 +/- ##
========================================
- Coverage 71.8% 65.9% -5.9%
========================================
Files 1109 1104 -5
Lines 100594 99233 -1361
========================================
- Hits 72253 65485 -6768
- Misses 23423 29010 +5587
+ Partials 4918 4738 -180
| Flag | Coverage Δ | |
|---|---|---|
| aggregator | 74.5% <ø> (-1.5%) |
:arrow_down: |
| cluster | 69.6% <56.2%> (-15.4%) |
:arrow_down: |
| collector | 84.3% <ø> (ø) |
|
| dbnode | 76.0% <ø> (-3.4%) |
:arrow_down: |
| m3em | 68.6% <ø> (-5.8%) |
:arrow_down: |
| m3ninx | 61.8% <ø> (-11.3%) |
:arrow_down: |
| m3nsch | 78.0% <ø> (+26.8%) |
:arrow_up: |
| metrics | 17.2% <ø> (ø) |
|
| msg | 74.1% <ø> (+<0.1%) |
:arrow_up: |
| query | 57.2% <ø> (-10.7%) |
:arrow_down: |
| x | 68.1% <ø> (-12.0%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update cb2d5f7...7d8682e. Read the comment docs.