characters icon indicating copy to clipboard operation
characters copied to clipboard

`Characters.operator ==` should document that it doesn't compare normalized forms

Open jamesderlin opened this issue 1 year ago • 1 comments

I expected that Characters.operator == would compare normalized forms, but it doesn't. (See https://stackoverflow.com/q/64094438/.)

If it intentionally doesn't, it would be nice if the operator == documentation explicitly stated that (and ideally recommended what people should do to normalize Unicode strings instead).

jamesderlin avatar Mar 23 '23 17:03 jamesderlin

This package does exactly one thing: Grapheme cluster segmentation in the default locale.

The documentation for == definitely needs fixing (what's it even saying?), but the fix will be to say that characters are equal if their underlying strings are equal, which means containing the same sequence of UTF-16 code units. (Or, what it tries to say now, that the Characters iterable values contain the same sequence of grapheme cluster substrings, which amounts to the same thing.)

lrhn avatar Mar 23 '23 18:03 lrhn