automerge-classic icon indicating copy to clipboard operation
automerge-classic copied to clipboard

Modeling RichText with Automerge

Open Gozala opened this issue 4 years ago • 39 comments

I would to figure out / invite to collaborate on a best practice for modeling a state for the rich text editor. Most popular rich text editors on the web (ProseMirror, Quill, Slate) use some for of tree representation. Quill is an exception as it represents document as array of formatted segments, but is still a subject to the same problem - One of the actors needs to split formatted segment of text (either insert something with a different format, or reformat) which ends up translating to deleting slice & inserting it afterwords, problem being that identity of the deleted & reinserted slice has different identity & there for any edits by other actors to the original slice no longer correspond to a new one.

I have discussed this to a greater length on slack but the best solution I could came up to was to represent rich text as sequence of tokens where tokens are either characters or formatting markers.

For example following state corresponds to:

{format:{italic:true}}Hello {format:{bold:true, italic:true}}beautiful{format:null} world\n

Following markdown:

**hello _beautiful_** wold

That is:

hello beautiful wold

This way splitting a formatted segment just means inserting a marker and causality is preserved, however it comes with a pain of mapping from / to text editor data model & making sure that overlapping markers are merged etc.. Furthermore Automerge.Text no longer seems to fit the bill.

I also have considered storing markers (offset, length as counters) separate from text but that seems even more troublesome so I have opted out.

I would very much love something like Automerge.RichText that bakes some some of these opinions internally and exposes higher level API like Automerge.Text but with methods like format(offset, length, format).

P.S.: I'm also working on above mentioned RichText` API, but also feel like discussing it here is probably a good idea.

Gozala avatar Jul 31 '19 04:07 Gozala

Are you writing a new rich-text editor or trying to integrate with an existing one? I think the best approach here is going to be determined by that question.

I'd suggest you consider a couple of other features like cursors (which need to be tracked in the document but won't be permanently persisted), and comments (which might be special markers?)

The automerge-slate repo has an example of the pain in translating another text editor's OT-style into Automerge, and it models the document as a tree.

Having done a lot of markup manipulation in past lives I'm somewhat nervous about inventing new representations without doing a bunch of extra research but maybe I can spare some time to dig in on what you're working on.

pvh avatar Jul 31 '19 16:07 pvh

Are you writing a new rich-text editor or trying to integrate with an existing one? I think the best approach here is going to be determined by that question.

Right now I'm integrating with quill but I think same general approach would work across the board & likely would apply even if I rolled out my own editor (which I don't want to do).

I'd suggest you consider a couple of other features like cursors (which need to be tracked in the document but won't be permanently persisted), and comments (which might be special markers?)

I have non rich text with cursors already which is actually fairly easy, they are stored separately because they don't get concurrent updates only actor owning that cursor does.

The automerge-slate repo has an example of the pain in translating another text editor's OT-style into Automerge, and it models the document as a tree.

I've seen it, but I think it suffers from the outlined problem.

Having done a lot of markup manipulation in past lives I'm somewhat nervous about inventing new representations without doing a bunch of extra research but maybe I can spare some time to dig in on what you're working on.

I'm nervous myself, also why I started this issue. It does seem that y-js author came to the same conclusion as myself in regards to marker tokens

  • https://github.com/y-js/y-richtext
  • https://stackoverflow.com/questions/38512294/which-crdts-can-be-used-to-implement-a-full-featured-collaborative-rich-text-edi#40208717

That alone obviously does not mean much, but I also have not found anything else other than RGASplit (see #195) which may offer a way to deal with outlined problem.

Mostly I opened to invite others (more qualified than myself) to offer some thoughts, feedback, ideas.

Gozala avatar Jul 31 '19 16:07 Gozala

I'd suggest you consider a couple of other features like cursors (which need to be tracked in the document but won't be permanently persisted), and comments (which might be special markers?)

Actually you're right, I was thinking of cursor as single position (which is easy to update base where inserted occurred before / after) but if you extend that to selections that becomes a lot more tricky.

Gozala avatar Jul 31 '19 17:07 Gozala

Definitely worth considering how moving text will work too.

pvh avatar Jul 31 '19 17:07 pvh

Definitely worth considering how moving text will work too.

You mean cut & paste ?

Gozala avatar Jul 31 '19 17:07 Gozala

Side remark: this echos quite a bit constraints I had in https://github.com/gozala/dominion where node moves were preferred over remove / insert. I end up wish something like register where I could stash nodes and then insert them from. Obviously doing this in collaborative setting seems far more challenging.

Gozala avatar Jul 31 '19 17:07 Gozala

I was hoping to get a basic demo working of what I was trying to build before sharing, but I'm also happy to do it much sooner as potentially it may save me from working towards the dead end

Gozala avatar Jul 31 '19 17:07 Gozala

I have spent a while thinking about this too, and I also think that a single document sequence with marker characters is the way to go. If you represent a document as a tree, there are a lot of operations that require deleting and re-inserting nodes (e.g. hitting enter in the middle of a paragraph, causing it to split into two paragraphs), which don't merge well in a concurrent setting. If a document is just a flat sequence, hitting enter in the middle of a paragraph just means inserting a '\n' element into that sequence.

My intention was that Automerge.Text should be usable for this purpose. That is, I want Automerge.Text to be able to contain marker objects as well as characters from the text, along the lines of:

new Automerge.Text(['h', 'e', 'l', 'l', 'o', ' ', {bold:true}, 'w', 'o', 'r', 'l', 'd', {bold:false}])

At the moment I think that's not working, but that's a bug that should be fixed. And I think it's a good idea to create an Automerge.RichText, which I think could be done as a layer on top of Automerge.Text. Perhaps RichText could create a projection of the sequence of characters and marker characters, and map it into the tree structure used by rich text editors.

Regarding cursor positions, the safest would be to use the internal element ID to refer to a position within the text, since it remains stable, even as people insert and delete text before and after the position. The Text class has a .getElemId(index) method that looks up the element ID for a particular index; the translation in the opposite direction doesn't seem to be exposed in the API at the moment, but that would be easy to add.

ept avatar Jul 31 '19 19:07 ept

Regarding cursor positions, the safest would be to use the internal element ID to refer to a position within the text, since it remains stable, even as people insert and delete text before and after the position. The Text class has a .getElemId(index) method that looks up the element ID for a particular index; the translation in the opposite direction doesn't seem to be exposed in the API at the moment, but that would be easy to add.

What happens if the element cursor was at got removed ? Would index somehow be restored via tombstone ?

Would you do the same for selections, meaning element IDs for begin / end indexes ?

This makes me also wonder what if actor receives cursor / selection for anchored to elements that are not (yet) present in the document.

Gozala avatar Jul 31 '19 19:07 Gozala

My intention was that Automerge.Text should be usable for this purpose. That is, I want Automerge.Text to be able to contain marker objects as well as characters from the text, along the lines of:

new Automerge.Text(['h', 'e', 'l', 'l', 'o', ' ', {bold:true}, 'w', 'o', 'r', 'l', 'd', {bold:false}])

For what it's worth I chose to avoid open / close style markers that introduce complication due to nesting

{bold:true}hello {italic:true}wold{italic:false}{bold:false}

with a markers that imply flat structure:

{bold:true}hello {bold:true, italic:true}wold{}

I'd be interested to hear arguments in favor of the nested approach assuming there are some.

Gozala avatar Jul 31 '19 19:07 Gozala

What happens if the element cursor was at got removed ? Would index somehow be restored via tombstone ?

Yes, currently tombstones are retained indefinitely anyway, so the element ID resolution will work even after the element has been deleted. And yes, representing ranges as a start-ID and end-ID should work fine.

ept avatar Jul 31 '19 19:07 ept

For what it's worth I chose to avoid open / close style markers that introduce complication due to nesting

I agree with that: in a concurrent editing setting it will probably not be possible to ensure proper nesting, so whatever thing maps the markers into a DOM-like tree structure will have to handle arbitrarily ordered markers. It seems better to think of the markers as state transitions ("stuff from here onwards is bold! stuff from here onwards is non-bold!") rather than opening and closing tags.

ept avatar Jul 31 '19 19:07 ept

Maybe I am overlooking something, but "If a document is just a flat sequence, hitting enter in the middle of a paragraph just means inserting a '\n' element into that sequence." means we can not express tree structures. I can not imagine WYSIWYG without at least an anchor which has to be split as well.

steida avatar Jul 31 '19 23:07 steida

Maybe I am overlooking something, but "If a document is just a flat sequence, hitting enter in the middle of a paragraph just means inserting a '\n' element into that sequence." means we can not express tree structures. I can not imagine WYSIWYG without at least an anchor which has to be split as well.

Depends on the editor, for example Quill treats every line of text as paragraph so there model would really be just inserting \n.

However it is true that in some cases you may need something more more complex that is insert \n and copy of of the formatting attributes. However most rich text editors would handle block splitting logic for you, so you'll just need to map it back to RichText model as discussed above.

Gozala avatar Jul 31 '19 23:07 Gozala

I've published my current draft of the discussed RichText exploration here https://github.com/Gozala/conspire/

At the moment I'm using array / list instead of Text because of the issues discussed in #194, but hopefully I'll be able to switch back.

Concurrent selections are broken right now, but shouldn't be too hard to fix once #198 is fixed.

Interesting / relevant bits are RichText class that just wraps Automerge.Text (well array currently). And a QuillRichText subclass that adds patch(delta) method which takes Quill Delta and applies changes to the RichText instance and toContent() method that encodes RichText state into Quill.Delta format (which is also how quill represents editor state) so it can be updated accordingly.

Gozala avatar Aug 01 '19 00:08 Gozala

Here’s one way to do a link

before link { link: { to: "https://github.com", target: "_blank", etc } }link here{ link: "end" } after link

j-f1 avatar Aug 02 '19 17:08 j-f1

Here’s one way to do a link

before link { link: { to: "https://github.com", target: "_blank", etc } }link here{ link: "end" } after link

That is more or less what my current implementation encodes to, following to be precise:

before link {attributes:{link:"https://github.com"}, length:0}link here{attributes:{}} after link

P.S.: {attributes, length:0} is used for markers, but there are also notion of embeds e.g. {embed:{img:"https://..."}, attributes: {...}, length: 1 }

Gozala avatar Aug 02 '19 18:08 Gozala

Okay, I've been working on this problem a bit and I think the current design is not going to work, but that we might have a way forward.

The fundamental problem is that an inline-control-characters design needs to apply formatting to spans either in some order (left-to-right?) or tries to consolidate all local formatting operations based on local knowledge... but can't predict what other users are doing.

This means that we either lose causal ordering in the first case, or fidelity in the second. Fortunately, we have a system already present in automerge that can manage keeping things ordered during merges: automerge.array!

My new design theory is that we should keep an ordered array of formatting spans. Each span will refer to a start and end character and will be applied in-order on the user's machine when the document is materialized. This is potentially slow, but should at least provide correct results and we can worry about optimization next.

pvh avatar Oct 21 '19 18:10 pvh

Important: an Ordered Array of Formatting Spans will henceforth be known as the OAFS model.

pvh avatar Oct 21 '19 18:10 pvh

My new design theory is that we should keep an ordered array of formatting spans. Each span will refer to a start and end character and will be applied in-order on the user's machine when the document is materialized

Is that separate array from the text characters? And I presume it refers to start & end chars by an id isn’t it ?

What happens when span range needs to change either grow or shrink ? I don’t know what you have in mind but I’m suspecting that If removing a span is ever necessary that is going to lead to some errors.

Gozala avatar Oct 21 '19 19:10 Gozala

The structure will look like this, if you imagine a bold operation on "cada", and unbold of "raca" below:

['abracadabra']: Automerge.Text
[ 
  [<character_id for 'c' character>, <character_id for 'b' character>, { bold: true },
  [<character_id for 'r' character>, <character_id for 'd' character>, { bold: false },
]

pvh avatar Oct 21 '19 19:10 pvh

@pvh I think primary problem with just computed formatting dictionaries (in front position only & no span length) is that could lead to multiple such dictionaries in the same position or is there something else ?

If that is only problem is there no way to handle that differently like collapse such dictionaries ? At the camp I remember we were talking about an idea of having a special operation like insertion of empty dict that would only insert one if it’s not already there.

Doing it in user space I’d imagine that by having separate dict where id of element prior to control char is used as a key and dict and as value so that all formatting attrs are changing same dict and (presumably) avoid multiple formatters next to each other.

Alternative user space solution I’ve considered was to use leftmost formatting dicts as the canonical and let actor that created second formatting dict do the merge + removal which I believe would converge

Gozala avatar Oct 21 '19 19:10 Gozala

Yes, you need to collapse them. You can see a non-causal implementation here: https://github.com/inkandswitch/pushpin/blob/quill-rich-text/src/renderer/TextDelta.ts

The issue is that you can't (in an open system) know who or what is doing what formatting out there. I think this approach is likely a significant improvement in fidelity of intent preservation.

pvh avatar Oct 21 '19 20:10 pvh

@pvh pointed major flaw with my approach on the call other day, that worth posting for anyone reading this, including future myself who already forgot this. So issue with collapsing all formatting into dictionaries is following:

Alice

v-a1

hello world!

Alice makes change to v-a2

hello {bold:true}world{}!

hello world!

Bob

Bob starts from v-a1

hello world!

And makes change v-b2

{italic:true}hello world!{}

hello world!

Merge

Once changes a2 & b2 reach both peers they converge to

{italic:true}hello {bold:true}world{}!{}

hello world!

Instead of intended

{italic:true}hello {bold:true, italic:true}world{italic:true}!{}

hello world!

That is because when each participant injected new formatting rules neither was aware of formatting that had to be copied over. So all the formatting needs to be preserved in data structure and collapsed on the fly, although I suspect we'd need to have collapsed cache for responsive local edits and more expensive re-computations could be performed when receiving changes from others, which might be deferred to be performed when local edits idle.

Gozala avatar Nov 04 '19 17:11 Gozala

I would also like to encourage someone better qualified than myself to take a look at xi-editor text representation which seems to employ interesting approach that is promising fast operation and efficient memory representation while keeping append-only history.

Gozala avatar Nov 04 '19 18:11 Gozala

@pvh I was reading on https://braid.news/ other day which took me to sync9 performance overview that GC semantics via acknowledgement messages. I have not looked into details, but I'm guessing it's similar to idea I have being entertaining in my head for quite some time - let an actor propose compaction block with pointers to the heads of all participants. If all the other actors agree / acknowledge history prior to compaction could be deleted.

However I believe there is contingency with approach you were proposing - storing formatting information as a separate structure that refers to ranges based on character id's with in the text. Would not that imply that history can never be removed ? That being I'm not sure if current design of automerge already precludes that constraint & maybe that is irrelevant.

All this got me wondering if instead of anchoring fragments to text identifiers they were anchored to the offsets instead which could be adjusting on inbound operations. This starts to sound similar to what Xi Text engine does from my recollection, but I'd need to read that over to confirm.

Just wanted to throw this over here.

Gozala avatar Nov 14 '19 18:11 Gozala

Acknowledgement messages are interesting, but acknowledgement only works if you know who is out there. In other words, you have to know the set of other users in the universe to safely compact which is tricky. If you're willing to simply move forward without knowing who else is out there then you don't need acknowledgements. You just post a compaction message and anyone else who wants to keep collaborating with you will have to "rebase" work they want to keep on top of that or be dropped.

I'm not too sure about the cost of tombstones in the rich text formatting. Presumably if you were compacting you could rectify those markers as well, but once you start doing offset math you're kind of back in OT land which is... fine? but reintroduces a lot of the complexity we've been trying to avoid I think.

Maybe @ept will have some input here.

pvh avatar Nov 14 '19 19:11 pvh

Update: @pvh's initial implementation of the OAFS model is now in test/text_test.js as of #238. We can discuss later whether to make it available as part of Automerge, or whether it should be a separate layer on top of it.

ept avatar Mar 23 '20 12:03 ept

@Gozala I am curious if you think cases like https://github.com/automerge/automerge/issues/193#issuecomment-549470392 can be avoided by using markers to represent picking up and dropping attributes, rather than representing the state of all attributes:

Alice

v-a1

hello world!

Alice makes change to v-a2

hello {pickup:['bold']}world{drop:['bold']}!

hello world!

Bob

Bob starts from v-a1

hello world!

And makes change v-b2

{pickup:['italic']}hello world!{drop:['italic']}

hello world!

Merge

Once changes a2 & b2 reach both peers they converge to

{pickup:['italic']}hello {pickup:['bold']}world{drop:['bold']}!{drop:['italic']}

hello world!

Potential improvements would be to allow picking up the adoption or removal of attributes, and then to drop a reference to the pickup, e.g. {pickup: {bold: false}, id: 1} followed by {drop: 1}. Essentially you're building a little stack/LWW register per-attribute based on the orderable IDs of pick-ups.

For example, I think @pvh's "abracadabra" edits could be represented by (assuming the formatting operations can have unique, orderable IDs (using integers here just to express the idea), so a site with both format operations knows not to apply 1 since 2 is a younger operation for the bold format—it would, however, remain in a stack so that it's picked up once 2 is dropped):

ab{pickup: {bold: false}, id: 2}ra{pickup: {bold: true}, id: 1}ca{drop: 2}da{drop: 1}bra

These obviously don't directly map to anything like a valid DOM tree, but presumably the view layer could handle the translation to the final state:

<p>abraca<b>da</b>bra</p>

jclem avatar Jun 29 '20 13:06 jclem

In case this helps anyone: I had a ton of trouble using markers, so I just decided to stick the attribute object on each letter (definitely space inefficient, but it works!*). The sample code provided in test/text_test.js wasn't working properly after certain combinations of formatting (like you bold 2 letters, then unbold 1, then bold the same 2 again.. odd combinations like that would mess up the text). I re-worked that code a bit and got it working sticking the attribute object on each letter here. I'm sure there's a clean, more efficient, and safe algorithm that uses markers, but for anyone seeking a quick and dirty solution, here it is :)

Edit*: I had a small one line bug in my original implementation too that would sometimes mess up copy pasting large sections. Should be fixed in the code linked, but please be warned it's not rigorously tested code!

j-berman avatar Feb 06 '21 11:02 j-berman