pycrdt icon indicating copy to clipboard operation
pycrdt copied to clipboard

Test update squashing

Open davidbrochart opened this issue 2 months ago • 5 comments

@Horusiath I don't understand why it is not possible to squash updates when they come from different documents, while it is when they come from the same document. Do you have any idea?

davidbrochart avatar Oct 24 '25 09:10 davidbrochart

Each update is represented by so called Block structure: each block consists of user data + yrs/yjs metadata used for conflict resolution . These blocks are what is getting squashed when updates are merged together: as part of this process we can concat user data and merge metadata from different blocks into one.

Squashing requires some prerequisites in order to happen, one of them is sequentiality: we only can squash updates that happened for the same collection, one after another and were made by the same client.

Since these changes are not made by the same client (each Doc has its own client id) we cannot squash the blocks. If we tried, we'd loose data required for potential conflict resolution.

Horusiath avatar Oct 26 '25 19:10 Horusiath

I see, thanks for the explanation.

davidbrochart avatar Oct 26 '25 19:10 davidbrochart

If we tried, we'd loose data required for potential conflict resolution.

What is the best approach if the cost of keeping the history is more than the cost of the conflict resolution going wrong?

krassowski avatar Oct 27 '25 13:10 krassowski

If you don't care about conflict resolution or snapshots, you can either:

  1. recreate a doc - basically create a new clean document and reinsert its contents.
  2. replace individual shared collection - delete old collection and reinsert its contents. With GC turned on and no undo manager tracking that collection, this will cause garbage collection of entire collection and all of its children.

Horusiath avatar Oct 28 '25 06:10 Horusiath

  1. recreate a doc - basically create a new clean document and reinsert its contents.

I think this is the best option e.g. in Jupyter, but that needs cooperation from clients. In the server, we could mark the document as invalid and create a new one, and provide a information about how to connect to the new document (e.g. a new room ID). Clients should observe the invalid attribute, drop their document when it's set, and connect to the new document.

davidbrochart avatar Oct 28 '25 07:10 davidbrochart