automerge-classic icon indicating copy to clipboard operation
automerge-classic copied to clipboard

Compress or reset History

Open johannesjo opened this issue 4 years ago • 7 comments

This is a follow up on #168. It really would be great to have an option to reduce the size of history. My own use case would be a single user app. Consider the following scenario:

(no undo/redo is possible)
User makes changes on device A
User makes changes on device B
Both devices are online and sync their changes
=> any history up to this point can be thrown away

Would be nice to have a quick way to do this.

johannesjo avatar May 20 '20 18:05 johannesjo

This does seem useful, although so far our focus has been on representing history efficiently, so that you can keep history without too much storage or performance cost. See #253 on our progress in this regard.

Can I ask you more about your needs for clearing history? Is it to reduce storage requirement, or some other reason?

Moreover, there are API design challenges in making this work. If you want to throw away history that is present on all of your devices, that requires knowing what all your devices are, and what state they are in. At present, Automerge does not keep track of other devices or their state. Do you have a proposal on how we should allow applications to specify which portions of the history are fully synced, and which portions need to be retained because some devices haven't seen it yet? This mechanism should also deal with the fact that sometimes devices suddenly disappear and never come back (e.g. because their owner dropped it in the toilet).

ept avatar May 21 '20 19:05 ept

I am developing the time tracking/to do app Super Productivity. Over the last 3 years I have accumulated 4MB in Data for myself using the app, which is not an awful lot on the disk, but quite something when syncing requires you to send the complete data over the wire in regular intervals. Furthermore time tracking updates the model every second, so I am afraid that it will generate a lot of history (not sure how much the counter type will mitigate this problem).

I made a little flow chart on how I could see this work. I am pretty new to automerge so chances are I'm missing something important:

Untitled Diagram

A, B, C are possible different actorIds. I am not sure what the best way is to determine the lastUpdate value and if a timestamp might work or if we can determine this by the objectId.

johannesjo avatar May 22 '20 10:05 johannesjo

I ran some tests and with 2 changes per second it get's ugly pretty quickly. In fact after running this for half an hour I am almost unable to start the app again. It takes ages to load and the computer is heating up. What's the recommended way of using this?

johannesjo avatar May 22 '20 22:05 johannesjo

Are you already using ::getChanges/::applyChanges to avoid sending the whole doc over the wire?


Since automerge doesn't make any assumptions about how the changes are distributed, it seems pretty tricky to come up with a general solution. Consider:

  1. C shares the doc with a new actor D
  2. D goes offline
  3. A makes an update and synchronizes with B and C

From A's perspective it has never merged with D, and the oldest common update is A's last change, which D never received, if A truncates here any changes D has are now unmergable with A.

Maybe actors could be included in the document state, then C sharing with D could be used as a marker in history preventing truncation. So in that case A would only truncate up to what D's local state started as. Now if D was Alice's phone, which went offline because it dropped in the toilet, truncation could never go beyond that point.

I imagine this could be solved at the application level, e.g. a pop-up and say 'Alice's phone hasn't synced in a while, should we continue to wait?' and the user could figure out through other channels what happened to Alice's phone and whether to consider it a terminal case.

dropofwill avatar May 23 '20 02:05 dropofwill

You're right about that (I probably should have written a word or two). The idea was to include actors in the document state. One approach to do this could be that every node needs to record when its state is forked. Another could be a round trip between syncing actors, to ensure data has been actually transmitted .

I am pretty new to the subject of crdts, but from the perspective of a 'normal' user I think it can be absolutely reasonable to delete/ignore their changes, when they haven't synced for over a month. In that case it might be even what they prefer as they've likely forgotten about what they changed in the first place.

Another way out might be to give developers control over this by providing a makeThisMasterStateAndDeleteHistoryForever() method (which sounds proper megalomaniac to offer warning or encouragement). All older/conflicting states would simply be ignored when syncing with an actor which executed that method.

johannesjo avatar May 23 '20 08:05 johannesjo

I would second @johannesjo's suggestion of providing an API to "break with the past". While infinite history is a great ideal, time and space are limited, and end user do understand this.

As an analogy, Mac Time Machine backups eventually exhaust all available "history space" on the external disk, and the user is bluntly told so. "Your history only goes back to last September" or something.

One kind of API could conceivably allow to specify a space or (wall clock) time constraint, under user control, which would be exposed in the end user app as preference for their project file/workspace. For example "limit project to X megabytes", or "limit project to 12 month history".

Alternatively a simpler API might scan the document and return a list of changes with (wall clock) time and size, for the end user app to expose an interface that lets her manually pick the tradeoff. This would have to work in conjunction with an API that removes changes up to a designated change.

The history truncation would have to be detected so that when an old document ("device D") comes online 13 months later, the user is told it is too old to merge.

duncanwilcox avatar Jul 04 '20 13:07 duncanwilcox

The best thing would be if we could reduce the size of the history. But even then, for a scenario like ours, where multiple people collaborate on a document, and when they end their session (clean exit), our approach is to get the data the object represents at that point and reinit it to save space in our DB.

Our near term band aid solution

const doc1 = {doc with history}

// and at the end of the collaboration - which is a tricky thing in itself to determine
// If you find a good way to identify the end of a collaborative session, let us know as well on how to identify this
const dataWithoutHistory = _.cloneDeep(doc1) 

// this assumes you know how to clone your data, lodash works pretty good for general scenarios 
// (as automerge keys are not enumerable, you can safely exclude anything from automerge)

const docWithoutHistory = Automerge.from(dataWithoutHistory)

mkhanal avatar Aug 24 '20 07:08 mkhanal