pycrdt
pycrdt copied to clipboard
Fix Index Conversion from Text to TextRef
Hey @davidbrochart :wave:
I played around with some emojis in Text
and noticed that insertion is working different than expected:
🐍 test script
from pycrdt import Doc, Text
## setup
ydoc = Doc()
ytext = Text()
ydoc["text"] = ytext
state = "" # track state of ytext
def callback(event):
"""Print change record"""
global state
new_state = str(event.target)
delta = str(event.delta)
print(f"{delta}: '{state}' -> '{new_state}'")
# update current state
state = new_state
ytext.observe(callback)
## Manipulate Text
print("Insert and delete single emoji '🌴'")
# works as expected
ytext.insert(0, "🌴")
assert state == "🌴"
# given index is for Unicode code points
# but callback returns length of individual bytes in delta
del ytext[0:1]
assert state == ""
print("\nInsert '🌴abcde' sequentially")
for c, char in enumerate("🌴abcde"):
ytext.insert(c, char)
assert state == "🌴abcde"
Insert and delete single emoji '🌴'
[{'insert': '🌴'}]: '' -> '🌴'
[{'delete': 4}, {'insert': ''}]: '🌴' -> ''
Insert '🌴abcde' sequentially
[{'insert': '🌴'}]: '' -> '🌴'
[{'retain': 4}, {'insert': 'a'}]: '🌴' -> '🌴a'
[{'retain': 4}, {'insert': 'b'}]: '🌴a' -> '🌴ba'
[{'retain': 4}, {'insert': 'c'}]: '🌴ba' -> '🌴cba'
[{'retain': 4}, {'insert': 'd'}]: '🌴cba' -> '🌴dcba'
[{'retain': 5}, {'insert': 'e'}]: '🌴dcba' -> '🌴decba'
In the Python code, one gives the index for Unicode code points, however
TextRef structure internally uses UTF-8 encoding and its length is described in a number of bytes rather than individual characters
So, I put in some thought to adapt the given index to the UTF-8 encoded string with this PR:
Insert and delete single emoji '🌴'
[{'insert': '🌴'}]: '' -> '🌴'
[{'delete': 4}]: '🌴' -> ''
Insert '🌴abcde' sequentially
[{'insert': '🌴'}]: '' -> '🌴'
[{'retain': 4}, {'insert': 'a'}]: '🌴' -> '🌴a'
[{'retain': 5}, {'insert': 'b'}]: '🌴a' -> '🌴ab'
[{'retain': 6}, {'insert': 'c'}]: '🌴ab' -> '🌴abc'
[{'retain': 7}, {'insert': 'd'}]: '🌴abc' -> '🌴abcd'
[{'retain': 8}, {'insert': 'e'}]: '🌴abcd' -> '🌴abcde'
However, I am not sure how to deal with the numbers returned in event.delta
upon TextEvent
s, as they are also based on the UTF-8 encoded form and thereby can be off for the Python string representation. (My use case: keeping Text
in sync with contents of the Textual
TextArea
widget.)
Should the user deal with that with own code? Should Text
try to give the numbers for the Python string repr? Or should Text
be capable of handling rich text as TextRef
does:
TextRef offers a rich text editing capabilities (it’s not limited to simple text operations). Actions like embedding objects, binaries (eg. images) and formatting attributes are all possible using TextRef.
I also thought about limiting Text
to inserted values for which len(val) == len(val.encode())
, but this does not feel right to me.