FluidFramework
FluidFramework copied to clipboard
Runtime: Multi-stage id-compressor enablement
Problem
The primary challenge with enabling Id compression is that a single batch of ops may contain both the migration op that enables id compression and the first use of the id compressor. Because we guarantee that ops in the same batch will be processed synchronously, this implies that the chunk containing the Id compressor was previously loaded prior to processing the migration op that turned it on.
However, to avoid impacting customers who are not using Id compression, we do not load the IdCompressor chunk when it is in the "off" state. Therefore, we have a small paradox where the runtime needs to load the Id compressor prior to enabling it, but the runtime will not be aware it is being enabled until it's too late to load the id compressor chunk.
Proposed Solution
The proposed solution is to use a two-phase enablement. During phase 1, Loop will transition from "off" -> "delayed".
Important: The behavior of "delayed" has changed. Previously "delayed" would asynchronously load and begin using compressed Ids. In this proposal, transitioning to "delayed" has no immediate effect. However, post migration to the "delayed" stage, subsequent loads of the container will fetch the IdCompressor chunk (but not yet begin using IdCompression).
Once 99.99% of active sessions have reloaded in the "delayed" phase (i.e., have the IdCompressor chunk loaded), Loop will transition from "staged" -> "on" to enable Id compression. The 0.01% of clients that remained connected throughout the transition will now be forced to disconnect/reload.
One edge case not covered above are containers that were never loaded during the "off" -> "staged" transition. These are handled in the following way:
If a container is "off" and the desired state is "on", the pre-migration state is coerced to "delayed" (i.e., container runtime will load, but not activate Id compression.)
We then perform a "delayed" -> "on" migration in the normal way.