zed icon indicating copy to clipboard operation
zed copied to clipboard

Unicode Character Counts

Open ifsheldon opened this issue 9 months ago • 3 comments

Check for existing issues

  • [X] Completed

Describe the feature

I'd like to have character counting based on unicode instead of on bytes. This will greatly help normal non-English users to get basic character counting right.

If applicable, add mockups / screenshots to help present your vision of the feature

For example:

hello
你好

This should give a character count of (5 + 1 + 2) = 8, instead of (5 + 1 + 3*2) =12 like below.

image

Python counts the characters correctly:

string = "hello\n你好"
print(len(string))  # 8

Rust notes two ways to get the length of a string

let a = "hello\n你好";
assert_eq!(a.len(), 12); // in bytes
assert_eq!(a.chars().count(), 8);  // in characters or graphemes

ifsheldon avatar May 19 '24 03:05 ifsheldon

The change will require the underlying Rope structure to be grapheme aware, which I gave a quick test, approx 3 times slower than current implementation.

JunkuiZhang avatar May 19 '24 14:05 JunkuiZhang

If that is not possible, it would be nice to show "Bytes" instead of "Characters" to avoid confusion.

asdfer-1234 avatar May 20 '24 11:05 asdfer-1234

I don't think byte counts are useful for users of a text editor.

ifsheldon avatar May 21 '24 03:05 ifsheldon