FluidFramework icon indicating copy to clipboard operation
FluidFramework copied to clipboard

Normalize merge-tree's segment order on rebase of an insert op

Open Abe27342 opened this issue 3 years ago • 1 comments

Description

This fixes an eventual consistency issue exposed by fuzz testing. The core problem is that clients that are reconnecting may end up in a state in which they don't resolve tiebreaks of inserted segments the same way as other clients. This is because the tiebreaking logic makes assumptions about the relative ordering of inserted/removed segments in the collab window. Specifically, in the absence of reconnect it's never the case that a segment inserted at seq A appears immediately after a segment removed at seq B, where:

  • both A and B in the collab window
  • A > B
  • the ops with seq A and seq B are concurrent

(In this case, the insertion logic would have placed the inserted segment before the removed segment on all clients).

However, this invariant was no longer true with reconnect due to rebasing ops. The regression test demonstrates a case where this happens. The proposed fix implemented in this PR is to have the merge-tree update its internal state when one of its pending segment groups is rebased to instead be in terms of the current sequence number.

AB#1675

Abe27342 avatar Sep 13 '22 21:09 Abe27342

a thought i had on this area, could we augment the reconnect farm to hit it? that test is pretty simplistic right now.

anthony-murphy avatar Sep 15 '22 16:09 anthony-murphy

a thought i had on this area, could we augment the reconnect farm to hit it? that test is pretty simplistic right now.

Changing it to have both a client that always disconnects on application and a different one that only sometimes disconnects seems to suffice to make it hit the issue. There are still some failures on the long version of that test after doing so, but ~80% less (and not on new tests), so I'm reasonably comfortable checking this in.

Abe27342 avatar Sep 29 '22 16:09 Abe27342

@fluid-example/bundle-size-tests: +7.94 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 401.71 KB 404.61 KB +2.89 KB
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 200.17 KB 200.17 KB No change
loader.js 154.17 KB 154.17 KB No change
map.js 47.66 KB 47.66 KB No change
matrix.js 138.44 KB 140.59 KB +2.15 KB
odspDriver.js 153.15 KB 153.15 KB No change
odspPrefetchSnapshot.js 40.53 KB 40.53 KB No change
sharedString.js 158.57 KB 161.47 KB +2.89 KB
Total Size 1.3 MB 1.3 MB +7.94 KB

Baseline commit: 151ffe3b0a68be7701a3b3da61326d5a30a0910d

Generated by :no_entry_sign: dangerJS against 31821a884042f37599acbb0d7b095f1f0a329945

msfluid-bot avatar Oct 04 '22 16:10 msfluid-bot