zed
zed copied to clipboard
Unicode Character Counts
Check for existing issues
- [X] Completed
Describe the feature
I'd like to have character counting based on unicode instead of on bytes. This will greatly help normal non-English users to get basic character counting right.
If applicable, add mockups / screenshots to help present your vision of the feature
For example:
hello
你好
This should give a character count of (5 + 1 + 2) = 8, instead of (5 + 1 + 3*2) =12 like below.
Python counts the characters correctly:
string = "hello\n你好"
print(len(string)) # 8
Rust notes two ways to get the length of a string
let a = "hello\n你好";
assert_eq!(a.len(), 12); // in bytes
assert_eq!(a.chars().count(), 8); // in characters or graphemes
The change will require the underlying Rope
structure to be grapheme aware, which I gave a quick test, approx 3 times slower than current implementation.
If that is not possible, it would be nice to show "Bytes" instead of "Characters" to avoid confusion.
I don't think byte counts are useful for users of a text editor.