FluidFramework icon indicating copy to clipboard operation
FluidFramework copied to clipboard

Azure client must use batching.

Open andre4i opened this issue 1 year ago • 4 comments

Description

FlushMode.Immediate will be deprecated soon. This is the only place where it is still being used, mainly due to the 1MB payload size limit which we've since fixed with different other features described at: https://github.com/microsoft/FluidFramework/tree/main/packages/runtime/container-runtime/src/opLifecycle. Right now, azure client does not use batching at all, and [this may cause all sorts of issues such as client observing partial states](https://github.com/microsoft/FluidFramework/issues/18082

https://github.com/microsoft/FluidFramework/tree/main/packages/runtime/container-runtime/src/opLifecycle#how-batching-works

  • Compression and chunking are on by default and will solve more than 90% of the payload size issues
  • Op grouping will solve the rest, however it is not enabled by default nor in this PR.

This is the code change, which is trivial. The rollout is not, so I need some guidance here from the owners:

  • Compression is not compatible from 1.0 to 2.0. And with compression enabled by default (since client_v2.0.0-internal.4.1.0), a client must turn it off by feature gate, wait for everyone to move to 2.0, then remove the feature gate.
  • Op grouping requires all connected clients to be on 2.0.0-internal.7.0.0 or later, hence the reason for not enabling it in this change.

andre4i avatar Feb 21 '24 22:02 andre4i

@fluid-example/bundle-size-tests: +187 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 511.14 KB 511.18 KB +44 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 245.01 KB 245.03 KB +22 Bytes
loader.js 170.65 KB 170.67 KB +22 Bytes
map.js 46.53 KB 46.54 KB +11 Bytes
matrix.js 148.68 KB 148.69 KB +11 Bytes
odspDriver.js 97.34 KB 97.37 KB +33 Bytes
odspPrefetchSnapshot.js 42.28 KB 42.3 KB +22 Bytes
sharedString.js 167.42 KB 167.43 KB +11 Bytes
sharedTree.js 334 KB 334 KB No change
Total Size 1.87 MB 1.87 MB +187 Bytes

Baseline commit: ff2157ba9d0198bdde5f074a19933a54c0e07ab8

Generated by :no_entry_sign: dangerJS against 5a2673dd2120737a1ff51afe94705effba317b24

msfluid-bot avatar Feb 21 '24 22:02 msfluid-bot

Based on description and my understanding of Azure 2.0 rollout, it seems like customers on Azure API will need to perform a rollout of 2.0 with compression and op grouping disabled before it's safe to turn on those features. So e.g. two options I could imagine would be to turn off those features in 2.0 and wait for 3.0 (seems disappointing), or expose them to the azure API surface somehow with clear guidance around when it's safe to enable.

@sashasimic Do we have a plan here?

Abe27342 avatar Feb 28 '24 23:02 Abe27342

@Abe27342 I believe @pk-pranshu is going to drive this discussion? I never got much response to my comment in AB#5699, so that might be a good place to weigh in if you have opinions.

FWIW, I'm of the opinion that individual feature owners need to think about how their feature is realistically going to ship end-to-end, and that includes AzureClient. I'm of course happy to be a resource here, but their expertise and knowledge of the implementation, options, and ramifications is what will drive a good design.

ChumpChief avatar Feb 28 '24 23:02 ChumpChief

@Abe27342 can you evaluate whether this is still relevant? There is an open issue that has a dependency on this PR but it was approved but never merged.

nmsimons avatar Jul 16 '24 17:07 nmsimons

It seems like this was addressed by #20997. Once customers upgrading from 1.0 have saturated their ecosystem with 2.0 clients, they can flip the switch using these APIs and they'll switch to turn-based flushing.

Abe27342 avatar Jul 16 '24 17:07 Abe27342