pycrdt icon indicating copy to clipboard operation
pycrdt copied to clipboard

Unexpected text behavior for XmlText (possibly Text)

Open armanckeser opened this issue 2 months ago • 1 comments

I think rust indexes strings by utf8 bytes, whereas python string indexing is character based So In Python:

s = "a😊b"
s[1:2]

Would give you "😊" (the emoji, a single "character").

In Rust: The emoji "😊" is encoded as 4 bytes in UTF-8. If you pass Python's indices (which refer to characters) directly to Rust, Rust will likely treat them as byte offsets, which can end up splitting a character in the middle, causing panics or data corruption.

This will break del s[1:2] or similar behavior in Python if the backing Rust code expects byte offsets.

Fixing this would likely be a breaking change, but it does seem like a bug and the Python ecosystem would expect delete range on the Python side to work based on character indexing as opposed to utf8 byte indexing

armanckeser avatar Oct 06 '25 20:10 armanckeser

Possibly related: https://github.com/y-crdt/pycrdt/pull/129.

davidbrochart avatar Oct 07 '25 07:10 davidbrochart