Markus Scherer
Markus Scherer
> I thought it was sorted by shifted values ... not a real sort. The real UCA allkeys.txt is sorted with something like alternate=shifted (not sure if that's completely true,...
> I think we should change checkEnum to throw directly, and deal with the rest of the CheckProperties errors another day—the regexes are known to be out of date, for...
> (today I learned about CCC=37 for [U+0901](https://util.unicode.org/UnicodeJsps/character.jsp?a=0901) until Unicode 3.0…) I got curious about that. That went away in Unicode 2.1.8: - https://www.unicode.org/Public/2.1-Update2/UnicodeData-2.1.5.txt - https://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt - 1998-dec: https://www.unicode.org/history/publicationdates.html In...
I think the contains overloads work well. Consider changing get_u32 to get_for_u32 or get_from_u32. If a class/trait only ever deals with u32 and not char, then get(u32) should be fine.
Note: The data structures are designed to map from *code points* to values. In Rust, supporting all code points requires u32 because char forbids surrogate code points. Therefore, one could...
The [proposal](https://github.com/unicode-org/icu4x/issues/2413#issuecomment-1227572950) wfm.
utf32 is a string encoding. u32 is one possible type for a code point.
@eggrobin I have the latest Unicode 16 data here. Locally, test pass except for intltest rbbi and intltest idna. I will probably disable the failing idna (UTS46) tests for a...
> Locally, test pass except for intltest rbbi and intltest idna. I will probably disable the failing idna (UTS46) tests for a while. Done. Locally, only intltest rbbi fails now.
> > Can you please update the segmentation code & data as needed? > > In this branch, or in a separate PR? This pull request here is set up...