icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Make UTS 46 normalization non-experimental

Open hsivonen opened this issue 1 year ago • 8 comments

  • Bake ignored/disallow data into the normalization data after all.
  • Make public operations available via a dedicated wrapper type instead of the main normalizer types.

Closes #2850

hsivonen avatar Mar 20 '24 11:03 hsivonen

The ICU PR is https://github.com/unicode-org/icu/pull/2945

hsivonen avatar Apr 04 '24 11:04 hsivonen

See https://github.com/hsivonen/rust-url/blob/icu4x/idna/src/uts46.rs for usage.

hsivonen avatar Apr 04 '24 12:04 hsivonen

It seems that the remainin CI failures are due to the ICU4C part not yet contributing to the exported data, so it seems OK to start reviewing this.

It doesn't make much sense to expose this API over FFI. If other languages want ICU4X-backed UTS 46, it makes sense to introduce FFI for https://github.com/hsivonen/rust-url/blob/icu4x/idna/src/uts46.rs .

hsivonen avatar Apr 04 '24 19:04 hsivonen

Changing review request to @eggrobin per discussion with @echeran .

hsivonen avatar Apr 15 '24 11:04 hsivonen

@eggrobin says he is not the correct person to review this from the code side. Adding back @echeran for the code review.

sffc avatar May 03 '24 21:05 sffc

question + suggestion (optional): IIUC, at this point, UTS 46 is pretty much IDNA2008. Is that correct? If so, you could make a mention of that in the docs becuase UTS 46 covers not just IDNA2008 but also IDNA2003 and the transition rules between the two. If at this point, everyone has switched over to IDNA2008, including all major browsers, it would help the docs to add that extra specificity.

echeran avatar May 14 '24 18:05 echeran

question + suggestion (optional): IIUC, at this point, UTS 46 is pretty much IDNA2008. Is that correct? If so, you could make a mention of that in the docs becuase UTS 46 covers not just IDNA2008 but also IDNA2003 and the transition rules between the two. If at this point, everyone has switched over to IDNA2008, including all major browsers, it would help the docs to add that extra specificity.

No, UTS 46 non-transitional accepts a superset of what IDNA 2008 accepts and the major browsers use UTS 46 non-transitional specifically. (To use an example from the UTS 46 spec, switching to pure IDNA 2008 would break URLs like http://www.ÖBB.at/ .) Beyond the remark in the docs about the three major browser engines, I'd prefer to leave the characterization of the situation to the UTS 46 spec itself.

LGTM

Thanks! Since https://github.com/unicode-org/icu/pull/2945 looks ready to land, I'll open another ICU4C PR with a backport to the maintenance branch. Then we can make datagen over on the ICU4X side pull the updated export before merging this.

hsivonen avatar May 15 '24 08:05 hsivonen

Opened https://github.com/unicode-org/icu4x/issues/4905 about using data exported with the backport of the ICU4C patch.

hsivonen avatar May 15 '24 09:05 hsivonen