Markus Scherer comments

Results 324 comments of


                                            Markus Scherer

can we coalesce quotation mark CE lists into single CEs?

> I thought it was sorted by shifted values ... not a real sort. The real UCA allkeys.txt is sorted with something like alternate=shifted (not sure if that's completely true,...

CheckProperties exception for RGI_Emoji_Flag_Sequence

> I think we should change checkEnum to throw directly, and deal with the rest of the CheckProperties errors another day—the regexes are known to be out of date, for...

CheckProperties exception for RGI_Emoji_Flag_Sequence

> (today I learned about CCC=37 for [U+0901](https://util.unicode.org/UnicodeJsps/character.jsp?a=0901) until Unicode 3.0…) I got curious about that. That went away in Unicode 2.1.8: - https://www.unicode.org/Public/2.1-Update2/UnicodeData-2.1.5.txt - https://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt - 1998-dec: https://www.unicode.org/history/publicationdates.html In...

Evaluate consistency and naming of char vs u32 methods in icu_collections and icu_properties

I think the contains overloads work well. Consider changing get_u32 to get_for_u32 or get_from_u32. If a class/trait only ever deals with u32 and not char, then get(u32) should be fine.

Evaluate consistency and naming of char vs u32 methods in icu_collections and icu_properties

Note: The data structures are designed to map from *code points* to values. In Rust, supporting all code points requires u32 because char forbids surrogate code points. Therefore, one could...

Evaluate consistency and naming of char vs u32 methods in icu_collections and icu_properties

The [proposal](https://github.com/unicode-org/icu4x/issues/2413#issuecomment-1227572950) wfm.

Evaluate consistency and naming of char vs u32 methods in icu_collections and icu_properties

utf32 is a string encoding. u32 is one possible type for a code point.

ICU-22707 Unicode 16 beta jun04

@eggrobin I have the latest Unicode 16 data here. Locally, test pass except for intltest rbbi and intltest idna. I will probably disable the failing idna (UTS46) tests for a...

ICU-22707 Unicode 16 beta jun04

> Locally, test pass except for intltest rbbi and intltest idna. I will probably disable the failing idna (UTS46) tests for a while. Done. Locally, only intltest rbbi fails now.

ICU-22707 Unicode 16 beta jun04

> > Can you please update the segmentation code & data as needed? > > In this branch, or in a separate PR? This pull request here is set up...