scribe icon indicating copy to clipboard operation
scribe copied to clipboard

Add grapheme cluster support

Open jmacdonald opened this issue 7 years ago • 0 comments

Scribe's coordinate system is designed as an abstraction over multi-byte "characters", such that a Range spanning one offset corresponds to a single on-screen character, even if it that character is represented by more than a single byte. Currently, that abstraction is naively centered around UTF-8 code points. However, a single on-screen character can be composed of multiple code points, and as a result, working with data that contains such characters breaks much of Scribe's data handling.

A UTF-8 grapheme cluster is what we should be using as the smallest atomic unit of text. The unicode-segmentation crate provides iterators that handle grapheme clusters, rather than code points; let's migrate to that so that the coordinate system supports the full UTF-8 character set.

jmacdonald avatar Jan 21 '17 19:01 jmacdonald