icu4x
icu4x copied to clipboard
Feature-gate UTF8 and UTF16 normalization
It reduces dependencies and compilation time.
It would remove utf8_iter and utf16_iter.
These are small dependencies that we use fairly widely in icu4x.
utf8_iter can be removed at 1.79 MSRV because it gets added to the standard library: https://doc.rust-lang.org/std/primitive.slice.html#method.utf8_chunks
utf8_itercan be removed at 1.79 MSRV because it gets added to the standard library: https://doc.rust-lang.org/std/primitive.slice.html#method.utf8_chunks
That's not the same thing. utf8_iter provides an API that's so close to chars() on str that macros that work on either.
Some points brought up in discussion:
utf8_chunksis not an acceptable replacement: it doesn't hav ereverse iteration or Results- We're okay with adding default features as a non-breaking change.
- The codepaths here are not shared:
striteration uses a more optimized path. - These crates are used in normalizer, collator, segmenter for alternate APIs. It is used mandatorily by ixdtf.
- These crates have unsafe
Litmus test: if the primary way of using the crate, or one of the primary ways of using the crate, uses APIs that don't pull in a dep, we may wish to consider feature gating, especially if it is unsafe. This is a per-crate call, not a global call: ixdtf may unconditionally use utf8_iter, but normalizer still can gate. This is not a per-depenendency decision.
Decision: Feature gating ill formed utf8 and utf16 in normalizer is something we can do. This is not something we must do (@manishearth will probably do it at some point). Let's call the features: utf8_iter, utf16_iter unless someone can come up with better names. We could consider a similar approach for other crates, but for now we agree to do it for icu_normalizer.
Agreed: @manishearth @sffc @hsivonen @nekevss
Making UTF-16 support optional makes not only utf16_iter optional but also write16.