Henri Sivonen
Henri Sivonen
FWIW, I have not tested this PR, but I have tested the upstream `IdnaTestV2.txt` in combination of IDNA back end that Firefox uses (but testing in isolation of Firefox) and...
> For the small type, the special boundary lines up with the boundary between 2-code-unit and 3-code-unit UTF-8 sequences. Oops. That's not correct. With the small trie mode, the special...
Sadly, https://github.com/rust-lang/rust/issues/110998 still isn't in stable, so can't use `core::ascii::Char` as an argument type to avoid an `unsafe` invariant on `u8`.
Apparently, there's https://github.com/rust-lang/rust/issues/123646 that could safely describe the API for two and three-byte UTF-8 accessors, but it's not on stable, either.
> I can't really give a good answer without knowing how the invariants work. Generally I wish invariants to be as local as possible: easily tracked through the code with...
GitHub won't let me add @smaug---- as a reviewer here.
The CI failure seems to be with JDK install in the infra and not a bug with the patch.
Merged per off-GitHub discussion with @smaug----
Hangul syllables are supposed to compare equal with the corresponding conjoining jamo, and the individual jamo here aren't conjoining jamo. I agree that it's suprising that non-conjoining and conjoining jamo...
The UCA spec discusses [multiple methods](https://www.unicode.org/reports/tr10/#Hangul_Collation) of handling conjoining jamo, and I'm not sure which one ICU4C and ICU4X use.