scribe
scribe copied to clipboard
Add grapheme cluster support
Scribe's coordinate system is designed as an abstraction over multi-byte "characters", such that a Range
spanning one offset
corresponds to a single on-screen character, even if it that character is represented by more than a single byte. Currently, that abstraction is naively centered around UTF-8 code points. However, a single on-screen character can be composed of multiple code points, and as a result, working with data that contains such characters breaks much of Scribe's data handling.
A UTF-8 grapheme cluster is what we should be using as the smallest atomic unit of text. The unicode-segmentation
crate provides iterators that handle grapheme clusters, rather than code points; let's migrate to that so that the coordinate system supports the full UTF-8 character set.