lexical
lexical copied to clipboard
refactor: Implement CollabTextNode as a single yjs map
Original bug
Self-host lexical playground, enable collaboration, enable split view. Type some characters ("abc" in the example) Disconnect one of the view. Add some characters before "abc" in one view ("123" in the example) In another view, delete "abc", then add some new characters ("456" in the example) Reconnect the view again, now two views are not consistent.
https://github.com/facebook/lexical/assets/36890796/73bf6dc4-9769-4e87-844f-be5184098e7c
Expected behavior: Two views in sync.
Root cause analysis
In original implementation, a "LexicalTextNode" is encoded to as two parts, a yjs map and a piece of string.
For example, the character sequence "abc", is encoded as [Y.map, "a", "b", "c"]. The map at the beginning indicates the format of the character sequence , then followed by the LexicalTextNode's content. Therefore, if different format texts are concatenating together, then each of them would start with a Y.map, followed by their content respectively.
This implementation violates CRDT's rule.
In the example above, at the beginning, the sequence is like:
[Y.map, a, b, c]
Then, when the "123" is added, the diff is encoded as
Insert at position 1, character 1, 2, 3
Then, when the "abc" is deleted, the diff is encoded as
Delete sequence from position 0, length 4, then insert at position 0 [Y.map, 4, 5, 6]
When the two views are connected again, those two diffs are getting merged.
For the view on the left (current [Y.map, 1, 2, 3, a, b, c]), the new diff becomes:
Delete 1, retain 3, delete 3, add [Y.map, 4, 5, 6]
Then it becomes [1, 2, 3, Y.map, 4, 5, 6]
, which is already broken since the leading Y.map is removed. It is shown as "23456".
For the view on the right(current [Y.map, 4, 5, 6]), the new diff becomes:
insert [1, 2, 3]
Then it becomes [1, 2, 3, Y.map, 4, 5, 6]
, not sure why it is converted to 12456
. Possibly due to some inconsistence when converting delta to LexicalTextNode.
Similar cases can be created when disconnecting and then remove styled texts in a multi-format text sequence and added some new texts.
Refactor
To refactor, a new entry _text
is added to CollabTextNode
's map, that is a Y.Text
class, so all the character changes are no longer handled by the CollabElementNode
, and we do not encode the text as a leading prefix with characters.
Now, each CollabTextNode
is exactly one Y.map in the element node's XmlText class (with length 1), so all the intricate size computation can be removed.
Going back to the original bug, the correct behavior should be: Original: "abc" Left: "123abc" Right: "456" After merge: "456" Although it is not intuitive at the first glance, that is the correct behavior. Suppose this is not a text node, this is a list item. The first user adds some texts, the second user remove the item, then adds a new item with some texts, when they are merged, only the new item and new texts are kept, since the second user directly removes the container (the original list item).
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
lexical | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 10, 2024 10:38am |
lexical-playground | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 10, 2024 10:38am |
Hmmm, seems all the test cases should be also refactored. They are also following the original encoding.
I don't know enough about Yjs and CRDTs. We need @fantactuka on this one.
I stumbled upon this thread: https://discuss.prosemirror.net/t/differences-between-prosemirror-and-lexical/4557/8 where there are some comments on the choices for Yjs implementation. Thought it might be helpful, since I'm not sure if the changes in this PR adhere to the original idea with Yjs was approached with.
Well, actually this PR does not change anything related to Dominic's original architecture design. The original idea is implementing an identical tree (CollabXxxNode) in yjs that reflects current Lexical tree, so sync between people are handled by yjs, and Lexical is responsible for syncing between CollabXxxNode and LexicalXxxNode. This PR has nothing to do with that fundamental idea.
The intent of the current PR is, specifically, CollabTextNode has some flaws, under certain circumstances is could break the synchronization between Lexical and yjs, so its own data structure (how it represents LexicalTextNode) is changed.
I'll spend a bit more time on PR changes, but first concern is the change won't be backward compatible, and updating to a newer Lexical version will cause existing content corruption
the change won't be backward compatible
Indeed🤔, we need to keep some code that converts legacy yjs representation to LexicalNode too. AFAIK there is no indicator of Lexical version in the yjs representation, right?
As discussed here, yjs serialization should ideally be given by importJson
or exportJson
.
Once that's done, the recommended way to make this backward compatible would be in the serialization methods.
Once that's done, the recommended way to make this backward compatible would be in the serialization methods.
@GermanJablo Thank you for your attention. Here we are talking about another stuff, the backward compatibility issue is collabNode's backward compatibility issue, it has nothing to do with how Lexical and CollabNode are mapping to each other.
The main problem is, users may have persisted the collabNode directly and read from it (for example, y-indexeddb provider), so when Lexical is upgraded, all the legacy yjs document would be broken. Another senario would be that two different Lexical version app cannot sync with each other via a yjs connection provider.
You're right, my bad.