automerge-classic Branching, Forks, and "Dangling Tips"

The default mode of operation for Automerge is to render all operations in the op_set into the output document.

For Trellis, we added a history view to crawl back through earlier states of the document and see both the change at that time and the state of the document.

Our new application adds a notion of asynchronous collaboration. That is to say, we don't always want to apply operations as soon as they're received from other users. This means we not only want to be able to see how the document appears for another user, but also to anticipate how it might appear if we merged their changes and yours.

As a result, our current solution involves creating a great many extra automerge documents -- one for each known peer involved in editing a document to represent their last-known-state and additional automerge documents to represent speculative merges to provide previews of what things might look like if I merged your changes into my work.

This leads to large amounts of wasted work and memory. Most of these documents will be identical modulo a few operations at any point in time, and the cost of creating copies is fairly high.

It would be nice if automerge had better support for structural sharing and an API for querying these kinds of document versions.

Feb 06 '18 19:02 pvh

To support these changes, we propose a few alterations to the Automerge API.

First, .change() should take an actor_id and some form of vector clock or cursor as input, rather than inferring them from the current document state, thus (at its most explicit):

let doc = Automerge.init()
let actor_id = 'A'
let a_view = Automerge.newView({'A': 'head'})
doc = Automerge.change(doc, actor_id, Automerge.currentClock(a_view), state => {
    state.title = 'A Verbose, but Unambiguous Interface'
})

To see what a different person's document might look like, we would create a new view (following on from the above code.

let peer_id = 'B'
let b_view = Automerge.newView({'B': 'head'})
b_view.title // "This Seems a Bit Much, Don't You Think?"

To preview what a merge between A and B might look like (let us assume for the moment that there are other branches / actors out there.

let merge_view = Automerge.newView({'A': 'head', 'B': 'head'})
merge_view.title // "A Verbose, but Unambiguous Interface"
merge_view.title._conflicts // ["This Seems a Bit Much, Don't You Think?"]

Finally, to merge the documents permanently, A would write a no-op commit including B's vector clock.

doc = Automerge.change(doc, actor_id, Automerge.currentClock(merge_view), state => {
    // no state changes required, so we could really omit this argument entirely
})

As an addendum, I'll propose in passing that history would vary from branch to branch, so the getChanges() call would take a view instead of a document, and that we might want the ability to create a view from a specific numeric vector clock. This is why the newView() call described above takes a pseudo-clock with 'head' values. One may wish instead to pass in particular sequence numbers for particular actors. These pseudo-clocks also require expansion into true clocks by expanding each head value to the clock of the last sequence number written by that actor, then unioning the clocks and taking the maximum sequence number value from each actor.

Feb 06 '18 19:02 pvh

I think this is a great idea and a very useful feature. There are just some details about which I'm not yet sure.

How do you think the API for applying changes from a remote node should look? At the moment we have Automerge.applyChanges() which takes a document and a list of changes, and returns a new document with those changes applied. When we have multiple views/branches/forks of a document, a new problem arises: a change may need to be reflected in several views (namely, all views that are following the actor that generated the new change), but not all views.

The current immutable API doesn't lend itself well to updating several views at once, since each view is a separate immutable object. And having to manually apply the same change to several view objects would defeat the point of what you're looking for in this API.

So when you say:

let b_view = Automerge.newView({'B': 'head'})
b_view.title

does b_view implicitly have a reference to some underlying document object? Is that reference mutable (i.e. when the document is modified due to remote changes being applied, is the b_view object mutated to reflect those changes)?

To keep things immutable, one option would be to make a view a function that takes a document object. For example:

let doc = Automerge.load(...)
let b_view = Automerge.newView({'B': 'head'})
b_view(doc).title // "A Verbose, but Unambiguous Interface"

doc = Automerge.applyChanges(doc, network.getLatestChanges())
b_view(doc).title // may or may not have changed, depending on whether B made a change

Or we could always get a view from the document object by passing a vector clock:

let doc = Automerge.load(...)
let b_view = Automerge.getView(doc, {'B': 'head'})
b_view.title // "A Verbose, but Unambiguous Interface"

doc = Automerge.applyChanges(doc, network.getLatestChanges())
// changing doc does not change b_view, so we have to refresh the view
b_view = Automerge.getView(doc, {'B': 'head'})
b_view.title // may or may not have changed, depending on whether B made a change

What do you think?

Feb 09 '18 12:02 ept

Yes, we had discussed this problem but didn't have an obvious solution. I agree that we want to stay functional and immutable across the API.

I think on some level the first and second options are the same, except in the first you've already curried the cursor into the call of the second. I was contemplating whether sticking a view list into the document might be important. If it lives in there we can incrementally update the views as they're required rather than having them be, effectively, arbitrary queries.

Thoughts on that?

Feb 09 '18 16:02 pvh

Yes, I think it would be a good idea for the document to have a list of views that get maintained incrementally, as that will allow for better performance optimisations. However, this means that view registration is itself something that affects the state of the document, and thus needs to return a new document instance. That would suggest an API like:

let doc = Automerge.load(...)
doc = Automerge.registerView(doc, {'B': 'head'})
let b_view = Automerge.getView(doc, {'B': 'head'}) // would raise exception if we hadn't called
                                                   // registerView({'B': 'head'}) beforehand
b_view.title // "A Verbose, but Unambiguous Interface"

Here, the object {'B': 'head'} is used both for the initial registration of the view, and for the subsequent retrieval. Alternatively, the registration could return an opaque "view handle" (a string or number):

let doc = Automerge.load(...), handle
[doc, handle] = Automerge.registerView(doc, {'B': 'head'})
let b_view = Automerge.getView(doc, handle)
b_view.title // "A Verbose, but Unambiguous Interface"

Do you have a preference of one over the other? The "handle" approach somehow feels like last-millennium C APIs to me…

Feb 12 '18 10:02 ept

Having given this a bit more thought over the last few days the notion of having spooky mutation in an otherwise functional system is a bit distressing.

I was going to suggest that registerView might return the same document now configured for efficient querying of that particular view, whereas getView would take advantage of a registered view or else compute a result on the fly.

This gives us the option to have efficient views, the option of having ad-hoc views without the overhead of maintenance for them, and still preserve a pure functional interface.

Feb 14 '18 23:02 pvh

automerge-classic automerge-classic copied to clipboard

Branching, Forks, and "Dangling Tips"

automerge-classic
automerge-classic copied to clipboard