icu4x
                                
                                
                                
                                    icu4x copied to clipboard
                            
                            
                            
                        Make UTS 46 normalization non-experimental
- Bake ignored/disallow data into the normalization data after all.
 - Make public operations available via a dedicated wrapper type instead of the main normalizer types.
 
Closes #2850
The ICU PR is https://github.com/unicode-org/icu/pull/2945
See https://github.com/hsivonen/rust-url/blob/icu4x/idna/src/uts46.rs for usage.
It seems that the remainin CI failures are due to the ICU4C part not yet contributing to the exported data, so it seems OK to start reviewing this.
It doesn't make much sense to expose this API over FFI. If other languages want ICU4X-backed UTS 46, it makes sense to introduce FFI for https://github.com/hsivonen/rust-url/blob/icu4x/idna/src/uts46.rs .
Changing review request to @eggrobin per discussion with @echeran .
@eggrobin says he is not the correct person to review this from the code side. Adding back @echeran for the code review.
question + suggestion (optional): IIUC, at this point, UTS 46 is pretty much IDNA2008. Is that correct? If so, you could make a mention of that in the docs becuase UTS 46 covers not just IDNA2008 but also IDNA2003 and the transition rules between the two. If at this point, everyone has switched over to IDNA2008, including all major browsers, it would help the docs to add that extra specificity.
question + suggestion (optional): IIUC, at this point, UTS 46 is pretty much IDNA2008. Is that correct? If so, you could make a mention of that in the docs becuase UTS 46 covers not just IDNA2008 but also IDNA2003 and the transition rules between the two. If at this point, everyone has switched over to IDNA2008, including all major browsers, it would help the docs to add that extra specificity.
No, UTS 46 non-transitional accepts a superset of what IDNA 2008 accepts and the major browsers use UTS 46 non-transitional specifically. (To use an example from the UTS 46 spec, switching to pure IDNA 2008 would break URLs like http://www.ÖBB.at/ .) Beyond the remark in the docs about the three major browser engines, I'd prefer to leave the characterization of the situation to the UTS 46 spec itself.
LGTM
Thanks! Since https://github.com/unicode-org/icu/pull/2945 looks ready to land, I'll open another ICU4C PR with a backport to the maintenance branch. Then we can make datagen over on the ICU4X side pull the updated export before merging this.
Opened https://github.com/unicode-org/icu4x/issues/4905 about using data exported with the backport of the ICU4C patch.