icu4x
icu4x copied to clipboard
Add LocaleData parameter for word/sentence segmenter
Add LocaleData parameter for word/sentence segmenter
This is a part of https://github.com/unicode-org/icu4x/issues/3284.
ICU4C has some language break rules for word and sentence segmenter, so this fix adds some rules to ICU4X per locale.
This adds LocaleData argument to all constructors. Also, locale difference is small and 2 data only, I add the override table data marker for machine state property.
Example code for how to optionally load the locale data:
if let Some(locale) = options.content_locale {
let request = // ...
match provider.load(request) {
Ok(response) => {
self.locale_specific_data = response.payload;
}
Err(DataError {
kind: DataErrorKind::IdentifierNotFound,
..
}) => {
// fall through
}
Err(e) => return Err(e),
}
}
My understand is that current macro (gen_any_buffer_data_constructors) doesn't support optional locale parameter.
try_new_unstable<D>(provider: &D) should be try_new_unstalbe<D>((provider: &D, Option<LocaleData>& locale), add try_new_for_locale_unstable(...) (used by datetime), add try_new_with_option_unstable(...), or something?
https://github.com/unicode-org/icu4x/issues/3284#issuecomment-2116381172 explains about try_new_with_options_unstable