lexical refactor: Implement CollabTextNode as a single yjs map

Original bug

Self-host lexical playground, enable collaboration, enable split view. Type some characters ("abc" in the example) Disconnect one of the view. Add some characters before "abc" in one view ("123" in the example) In another view, delete "abc", then add some new characters ("456" in the example) Reconnect the view again, now two views are not consistent.

https://github.com/facebook/lexical/assets/36890796/73bf6dc4-9769-4e87-844f-be5184098e7c

Expected behavior: Two views in sync.

Root cause analysis

In original implementation, a "LexicalTextNode" is encoded to as two parts, a yjs map and a piece of string.

For example, the character sequence "abc", is encoded as [Y.map, "a", "b", "c"]. The map at the beginning indicates the format of the character sequence , then followed by the LexicalTextNode's content. Therefore, if different format texts are concatenating together, then each of them would start with a Y.map, followed by their content respectively.

This implementation violates CRDT's rule.

In the example above, at the beginning, the sequence is like: [Y.map, a, b, c] Then, when the "123" is added, the diff is encoded as Insert at position 1, character 1, 2, 3 Then, when the "abc" is deleted, the diff is encoded as Delete sequence from position 0, length 4, then insert at position 0 [Y.map, 4, 5, 6] When the two views are connected again, those two diffs are getting merged.

For the view on the left (current [Y.map, 1, 2, 3, a, b, c]), the new diff becomes: Delete 1, retain 3, delete 3, add [Y.map, 4, 5, 6] Then it becomes [1, 2, 3, Y.map, 4, 5, 6], which is already broken since the leading Y.map is removed. It is shown as "23456".

For the view on the right(current [Y.map, 4, 5, 6]), the new diff becomes: insert [1, 2, 3] Then it becomes [1, 2, 3, Y.map, 4, 5, 6], not sure why it is converted to 12456. Possibly due to some inconsistence when converting delta to LexicalTextNode.

Similar cases can be created when disconnecting and then remove styled texts in a multi-format text sequence and added some new texts.

Refactor

To refactor, a new entry _text is added to CollabTextNode's map, that is a Y.Text class, so all the character changes are no longer handled by the CollabElementNode, and we do not encode the text as a leading prefix with characters. Now, each CollabTextNode is exactly one Y.map in the element node's XmlText class (with length 1), so all the intricate size computation can be removed.

Going back to the original bug, the correct behavior should be: Original: "abc" Left: "123abc" Right: "456" After merge: "456" Although it is not intuitive at the first glance, that is the correct behavior. Suppose this is not a text node, this is a list item. The first user adds some texts, the second user remove the item, then adds a new item with some texts, when they are merged, only the new item and new texts are kept, since the second user directly removes the container (the original list item).

Mar 10 '24 04:03 yf-yang

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
lexical	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 10, 2024 10:38am
lexical-playground	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 10, 2024 10:38am

Mar 10 '24 04:03 vercel[bot]

Hmmm, seems all the test cases should be also refactored. They are also following the original encoding.

Mar 10 '24 05:03 yf-yang

I don't know enough about Yjs and CRDTs. We need @fantactuka on this one.

Mar 10 '24 18:03 ivailop7

I stumbled upon this thread: https://discuss.prosemirror.net/t/differences-between-prosemirror-and-lexical/4557/8 where there are some comments on the choices for Yjs implementation. Thought it might be helpful, since I'm not sure if the changes in this PR adhere to the original idea with Yjs was approached with.

Mar 13 '24 23:03 ivailop7

Well, actually this PR does not change anything related to Dominic's original architecture design. The original idea is implementing an identical tree (CollabXxxNode) in yjs that reflects current Lexical tree, so sync between people are handled by yjs, and Lexical is responsible for syncing between CollabXxxNode and LexicalXxxNode. This PR has nothing to do with that fundamental idea.

The intent of the current PR is, specifically, CollabTextNode has some flaws, under certain circumstances is could break the synchronization between Lexical and yjs, so its own data structure (how it represents LexicalTextNode) is changed.

Mar 14 '24 01:03 yf-yang

I'll spend a bit more time on PR changes, but first concern is the change won't be backward compatible, and updating to a newer Lexical version will cause existing content corruption

Mar 15 '24 18:03 fantactuka

the change won't be backward compatible

Indeed🤔, we need to keep some code that converts legacy yjs representation to LexicalNode too. AFAIK there is no indicator of Lexical version in the yjs representation, right?

Mar 16 '24 03:03 yf-yang

As discussed here, yjs serialization should ideally be given by importJson or exportJson.

Once that's done, the recommended way to make this backward compatible would be in the serialization methods.

Apr 02 '24 02:04 GermanJablo

Once that's done, the recommended way to make this backward compatible would be in the serialization methods.

@GermanJablo Thank you for your attention. Here we are talking about another stuff, the backward compatibility issue is collabNode's backward compatibility issue, it has nothing to do with how Lexical and CollabNode are mapping to each other.

The main problem is, users may have persisted the collabNode directly and read from it (for example, y-indexeddb provider), so when Lexical is upgraded, all the legacy yjs document would be broken. Another senario would be that two different Lexical version app cannot sync with each other via a yjs connection provider.

Apr 07 '24 01:04 yf-yang

You're right, my bad.

Apr 07 '24 02:04 GermanJablo

lexical lexical copied to clipboard

refactor: Implement CollabTextNode as a single yjs map

Original bug

Root cause analysis

Refactor

lexical
lexical copied to clipboard