FluidFramework
FluidFramework copied to clipboard
Pending (in-flight) connection request may result in breaking invariants
Based on this query:
union Office_Fluid_FluidRuntime_* | where Data_docId == "Qd72Ck%2BXltvWMehs0wk6O3Bn1oTt%2BeTjXpPbIZ5uYv8%3D" | project Event_Sequence, Event_Time, Data_eventName, Data_reason, Data_error, Data_value, Data_connectionMode
- Some local edits are made, which causes container to switch from default "read" connection to "write" connection.
- forceReadOnly() is called when container is in process of switching, i.e. it has no connection (old is closed, new one is not established)
- forceReadOnly() is supposed to drop "write" connection, but given there is no connection in place, it's noop
- When connection is established, runtime attempts to send pending ops
- This results in hitting deltaManagerReadonlySubmit error and container closure.
- That's a check that is supposed to find all wrong-doing by components, i.e. making local edits while file is read-only. That's a no-no today, as file can be read-only because user has no write permissions or file is locked - we should not allow any changes in such cases
This flow breaks invariant: read-only container should not have "write" connection. That's very similar to discussion in connect/disconnect PR that we had with @scottn12 - we need to figure out how to validate connection against constraints / invariants that were put in place while connection was in flight.
I see two ways to do so:
- Have some validation code when connection is finally established (connection completes).
- Add cancelation mechanism and ability for code to inspect properties of pending connection (like connection mode).
I prefer #2, as it allows to localize code in right place. I.e. forceReadOnly() code would check against type of connection and cancel "write" connections. disconnect() flow would cancel any pending connection.
What happens to ops that were sent before forceReadonly() was called? I would think we want to send those ops since they were valid and then set to read-only. I am thinking we will have to stop new ops (close container on getting them?), process all current / pending ops and then switch to read-only mode.
@agarwal-navin, ops are not kept around, if connection is active, all of them are sent out. If they were lost in translation, then runtime make retry to send them on "read" reconnection and that will fail container (I believe). I have not seen it in the logs (high enough), so probably need to fix it, but it's not urgent - I'd prefer separate bug on that. Let's track actual connection properties invariant with this issue.
@scottn12, you mentioned you will be working on it (and I saw some PRs), assigning to you.
Sure. But we should have an issue that tracks what happens to the pending ops. It will otherwise look like data loss from user's perspective.
I think you are pointing to something else. Client can be offline with pending changes and switch to read-only mode. In this case all local changes will never make it to service. There is not much runtime can do about it - it's a choice of the host app to get into such state. The case I've described is a corner case of that, it does not happen that often.
Okay, that makes sense. Should we add a note to forceReadonly
API that says pending ops may be lost and to avoid that check that container is not dirty before doing so?
sure :)
We can close this right @vladsud and @scottn12 ? AB#300 is done, and I opened AB#1246 which tracks additional gaps discussed here (and a specific crash not due to in-flight connections but just a bug in the state machine)
This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!