yorkie icon indicating copy to clipboard operation
yorkie copied to clipboard

Add `--backend-snapshot-with-purging-changes` flag

Open markthethomas opened this issue 2 years ago • 4 comments

What happened: Hey there! Yorkie seems really really cool - exactly what I've been looking for for a niche project of mine. Very cool to finally find a project that has support for realtime collaborative editing but isn't just the client side with everything else left to create from scratch.

I've got it running remotely behind envoy and am able to sync docs etc. Trying now to look at some performance and such for the database (using mongodb) and noticed that it doesn't look like garbage collection is happening after a snapshot is taken. Or it's possible I don't fully understand the GC behavior properly 😓 😄

What you expected to happen:

I've been using the Text CRDT type to do collaborative text editing, like so:

doc.update((root) => {
        if (content) {
            if (!root.content) {
                const text = root.createText("content");
                text.edit(content.start, content.end, content.text);
            } else {
                root.content.edit(content.start, content.end, content.text);
            }
        }
    });

After 1000 edits, there are 1000 docs in the changes collection. I can see a snapshot being taken like so:

INFO	SNAP: '61d5e45922aadcfafba4606f$61d94aa4a3ff7fe9024d9cc9', serverSeq: 1010
INFO	RPC : "/api.Yorkie/PushPull" 653.961929ms
INFO	PUSH: '61d94aab0035805841153d7a' pushes 12 changes into '61d5e45922aadcfafba4606f$61d94aa4a3ff7fe9024d9cc9', rejected 0 changes, serverSeq: 1028 -> 1040, cp: serverSeq=1040, clientSeq=1040

However, it still seems like there are the same number of documents in the collection, which is surprising to me. I would've thought that the snapshot would remove the need for the entire history of changes made so far (maybe I'm off on that) or at least that some changes would've been removed as unnecessary. I noticed that there was a recent release that specifically enabled this to happen on snapshots, so wanted to file this issue in case either I'm missing something or it's a real bug. Thank you!

How to reproduce it (as minimally and precisely as possible):

see above

Anything else we need to know?:

Environment:

  • Operating system: linux (amd64 docker image)
  • Browser and version: n/a
  • Yorkie version (use yorkie version): 0.2.1
  • Yorkie JS SDK version: 0.2.0

markthethomas avatar Jan 08 '22 22:01 markthethomas

Thank you for your interest in Yorkie project. And I'm glad to hear that you agree with our project's goal, Just out of box.

Currently implemented GC is focused on removing tombstones in CRDT. So, the GC is removing the tombstones from the snapshot, not deleting the changes.

  • Briefly about tombstone: the following website
  • Snapshot size after GC: https://github.com/yorkie-team/yorkie/pull/287

However, as you mentioned, we can also delete changes before the snapshot creation time, because after creating a snapshot, we can use the snapshot to rebuild a specific state of the document. This can prevent the changes collection from continuing to grow.

I think it would be good to add a flag to delete previous changes after snapshot creation. --backend-snapshot-with-purging-changes

hackerwins avatar Jan 09 '22 14:01 hackerwins

ahhh I see - that makes sense. so the document itself is compressed versus the history of the document. Makes total sense!

I would tend to agree re: the purge changes flag. It seems like you would run into extremely high document counts with any kind of significant usage around text editing especially. As long as the option is tunable, it seems like it would be a great idea to have 👍

markthethomas avatar Jan 10 '22 00:01 markthethomas

Could I try this issue?

chromato99 avatar Jul 20 '22 10:07 chromato99

@chromato99 Sure. If you have any questions please let me know.

hackerwins avatar Jul 20 '22 10:07 hackerwins