icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Feature-gate UTF8 and UTF16 normalization

Open kirawi opened this issue 1 year ago • 2 comments

It reduces dependencies and compilation time.

kirawi avatar Jun 02 '24 21:06 kirawi

It would remove utf8_iter and utf16_iter.

These are small dependencies that we use fairly widely in icu4x.

sffc avatar Jun 04 '24 03:06 sffc

utf8_iter can be removed at 1.79 MSRV because it gets added to the standard library: https://doc.rust-lang.org/std/primitive.slice.html#method.utf8_chunks

robertbastian avatar Jun 27 '24 12:06 robertbastian

utf8_iter can be removed at 1.79 MSRV because it gets added to the standard library: https://doc.rust-lang.org/std/primitive.slice.html#method.utf8_chunks

That's not the same thing. utf8_iter provides an API that's so close to chars() on str that macros that work on either.

hsivonen avatar Dec 12 '24 15:12 hsivonen

Some points brought up in discussion:

  • utf8_chunks is not an acceptable replacement: it doesn't hav ereverse iteration or Results
  • We're okay with adding default features as a non-breaking change.
  • The codepaths here are not shared: str iteration uses a more optimized path.
  • These crates are used in normalizer, collator, segmenter for alternate APIs. It is used mandatorily by ixdtf.
  • These crates have unsafe

Manishearth avatar Dec 12 '24 18:12 Manishearth

Litmus test: if the primary way of using the crate, or one of the primary ways of using the crate, uses APIs that don't pull in a dep, we may wish to consider feature gating, especially if it is unsafe. This is a per-crate call, not a global call: ixdtf may unconditionally use utf8_iter, but normalizer still can gate. This is not a per-depenendency decision.

Decision: Feature gating ill formed utf8 and utf16 in normalizer is something we can do. This is not something we must do (@manishearth will probably do it at some point). Let's call the features: utf8_iter, utf16_iter unless someone can come up with better names. We could consider a similar approach for other crates, but for now we agree to do it for icu_normalizer.

Agreed: @manishearth @sffc @hsivonen @nekevss

Manishearth avatar Dec 12 '24 18:12 Manishearth

Making UTF-16 support optional makes not only utf16_iter optional but also write16.

hsivonen avatar Dec 18 '24 13:12 hsivonen