FluidFramework icon indicating copy to clipboard operation
FluidFramework copied to clipboard

An experiment with immediately attaching data stores

Open vladsud opened this issue 11 months ago • 2 comments

This is a prototype to see if we can get rid of various detached state and ensure that we can attach objects right away.

Why

We want to move into this direction because it allows us to remove a ton of complexity, code, and state tracking related to managing of swarm of local (non-attached) objects and attachment of such swarm. This code is extremely hard to follow, easy to break, and very hard to maintain. On top of it, it results in explosion of states (in our public API surface) that are hard to understand, like - node

  1. Not attached to parent
  2. Attached to parent, but parent is not attached to its parent (container)
  3. Attached all the way to container, but container is not attached.

Our attach states do not track that properly. And to the contrary, they expose states that should not be exposed (difference between attached & attaching does not matter for 99.9% of the code, it only matters internally for tracking pending attach ops). The newly visible states are better, but pretty much nobody uses them.

What

This change immediately attaches DDSs & Data stores to their parent (data store and container) on completion of initialization. For DDS, that happens instantly on creation. For data store - only when data store fully initialized (i.e. entry point of data store was called, which allows such things as DataObject to run through initialization code; you can also see https://github.com/microsoft/FluidFramework/pull/19956 for details where and how that happens).

Why it's safe

Data stores have internal IDs that are not possible to provide (they are auto-assgned). Until data store is aliased, or its handle is put somewhere into the container graph, remote clients can't observe such data store, even if it shows up in the file (we do not expose any kind of data store enumeration API or any other tool to "find" data store in the graph, other than by following handle links, starting with aliased data stores). As such, user of data store is in full control when such data store becomes visible to remote clients, and attaching it to a file does not change that.

DDS story is a bit more nuanced - please see comment in this PR. That said, it's likely much easier problem to manage in terms of ability to review usage in existing consumers, and corner cases where we could run into problems could be instrumented (before this change is enabled) such that we know more if it's a problem or not.

GC

It's worth pointing out that in a case where someone was creating detached data stores and never attached them, we will create more garbage in a file with this approach. This will put more pressure on GC to cleanup such garbage (over time). IMHO, ideally users should not even have ability to operate with detached / attached concepts, so this pattern is somewhat anti-pattern (if leveraged explicitly, i.e. not by accident). But this is one of those places where we are changing somewhat behavior of the system.

More ops

It's also worth acknowledging that this will result in more ops, as we are shrinking the distance between data store being fully initialized to being visible in the graph from unknown to zero. In the past, no ops would be flying in this range (as detached data store does not generate ops). I'd say this change was harder to consider without op grouping feature. But with all the recent changes to optimize our op pipeline, the impact here should be manageable. Note that it's not given that amount of traffic / bandwidth would be higher - it's scenario dependent. If there is not much overlapping activity (like overwriting same keys in a map), then changes would be accounted either in data store attach op or in individual ops. Individual ops have higher cost, but most of it is mitigated in protocol with grouping and compression (that said, it's more expensive to apply many ops compared to one op)

Details

Change includes:

  1. DataStore level: Calling channel.makeVisibleAndAttachGraph() at the end of data store creation process
  2. Attaching DDSs right away
  3. Fixing UTs

This is not final shape - if we want to ship it, we need to flight it properly.

It should be safe as data stores by default are not reachable by other clients (there is no way to enumerate datastores, and no way to guess their internal IDs). Only when data store gets alias or its handle is put into the graph, it becomes visible - when we reach that point, system should be in exactly same condition as before this change.

Somewhat weaker argument for DDSs (see comment in the code).

The hardest part here would be moving Loop off

  1. I do not see any examples in their repo where handle.isAttached or SharedObject.isAttached are used.
  2. But FluidDataStoreRuntime.attachState has 26 hits.

vladsud avatar Mar 05 '24 15:03 vladsud

@fluid-example/bundle-size-tests: -32 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 516.23 KB 516.22 KB -10 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 249.82 KB 249.8 KB -22 Bytes
loader.js 128.1 KB 128.1 KB No change
map.js 41.36 KB 41.36 KB No change
matrix.js 143.87 KB 143.87 KB No change
odspDriver.js 97.23 KB 97.23 KB No change
odspPrefetchSnapshot.js 41.91 KB 41.91 KB No change
sharedString.js 161.73 KB 161.73 KB No change
sharedTree.js 330.73 KB 330.73 KB No change
Total Size 1.82 MB 1.82 MB -32 Bytes

Baseline commit: ae1d0be26d61453cff316b3f622a9f3647149167

Generated by :no_entry_sign: dangerJS against afa555304057f209891b66bc1b73183d20bfbc82

msfluid-bot avatar Mar 05 '24 21:03 msfluid-bot

This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!