m3 icon indicating copy to clipboard operation
m3 copied to clipboard

Draft: support concurrent replaces within the mirrored placement code

Open andrewmains12 opened this issue 5 years ago • 1 comments

This is very WIP; I'm putting it up for @abliqo and potentially @prateek to look at it, but it's otherwise too messy to really consider (and importantly, mostly untested ;)).

This implements a fix for https://github.com/m3db/m3/issues/2850.

The high level approach:

[Limit] the instances we mark available to those affected by the replace, i.e. the leaving (replaced) instances. This will allow replaces that operate on independent shardsets to proceed concurrently.

What this means in practice is that we:

  • Lookup the shard set pair for the node being replaced
  • for each node in that shard set, try to mark shards available on it

Marking shards available on a node means:

  • for each initializing shard, find the corresponding leaving shard. If past cutoff, remove the leaving shard and mark the initializing shard available
  • for each leaving shard, find the corresponding initializing shard. Proceed as above.

One other bit of nuance here:

The mirrored algorithm has special casing around reclaiming shards--any case where a node left the cluster, and then was added back before cutovers/cutoffs have passed. This case also ended up needing a bit of changing here.

The reclaim code goes through the entire placement, and pulls back shards whenever they have sourceID == the readding node. For instance:

nodeA: {id = 1, state = INIT, source = nodeB}
nodeB: {1 LEAVING}

If nodeB is readded to the placement, it will reclaim shard 1.

Previous code for replace relies on the nodes being operated on being the only ones in flux. That is, if you have:

nodeA: {id = 1, state= INIT, source = nodeB}
nodeB: {1 LEAVING}
nodeC: {2 INIT source = nodeD}
nodeD: {2 LEAVING }

it will refuse to reclaim nodeB's shards, even though it theoretically could. I think the current check is overly conservative; a node A should be able to reclaim shards from a node B if:

// The leaving instances were a replacement for the adding instances iff:
//  - all shards on the leaving node are initializing and came from the adding node
//  - all shards on the adding node are leaving (and implicitly, they should be going to the adding node--I could add a defensive check there too, come to think of it)
//  - every leaving node has a unique adding node and every adding node has a unique leaving node (i.e. there is a bijection between them).

andrewmains12 avatar Nov 06 '20 22:11 andrewmains12

Codecov Report

Merging #2852 (7d8682e) into master (cb2d5f7) will decrease coverage by 5.9%. Report is 1063 commits behind head on master. The diff coverage is 56.2%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2852     +/-   ##
========================================
- Coverage    71.8%   65.9%   -5.9%     
========================================
  Files        1109    1104      -5     
  Lines      100594   99233   -1361     
========================================
- Hits        72253   65485   -6768     
- Misses      23423   29010   +5587     
+ Partials     4918    4738    -180     
Flag Coverage Δ
aggregator 74.5% <ø> (-1.5%) :arrow_down:
cluster 69.6% <56.2%> (-15.4%) :arrow_down:
collector 84.3% <ø> (ø)
dbnode 76.0% <ø> (-3.4%) :arrow_down:
m3em 68.6% <ø> (-5.8%) :arrow_down:
m3ninx 61.8% <ø> (-11.3%) :arrow_down:
m3nsch 78.0% <ø> (+26.8%) :arrow_up:
metrics 17.2% <ø> (ø)
msg 74.1% <ø> (+<0.1%) :arrow_up:
query 57.2% <ø> (-10.7%) :arrow_down:
x 68.1% <ø> (-12.0%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cb2d5f7...7d8682e. Read the comment docs.

codecov[bot] avatar Sep 11 '23 10:09 codecov[bot]