icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Add LocaleData parameter for word/sentence segmenter

Open makotokato opened this issue 1 year ago • 3 comments
trafficstars

Add LocaleData parameter for word/sentence segmenter

This is a part of https://github.com/unicode-org/icu4x/issues/3284.

ICU4C has some language break rules for word and sentence segmenter, so this fix adds some rules to ICU4X per locale.

This adds LocaleData argument to all constructors. Also, locale difference is small and 2 data only, I add the override table data marker for machine state property.

makotokato avatar Jul 30 '24 05:07 makotokato

Example code for how to optionally load the locale data:

if let Some(locale) = options.content_locale {
    let request = // ...
    match provider.load(request) {
        Ok(response) => {
            self.locale_specific_data = response.payload;
        }
        Err(DataError {
            kind: DataErrorKind::IdentifierNotFound,
            ..
        }) => {
            // fall through
        }
        Err(e) => return Err(e),
    }
}

sffc avatar Aug 11 '24 15:08 sffc

My understand is that current macro (gen_any_buffer_data_constructors) doesn't support optional locale parameter.

try_new_unstable<D>(provider: &D) should be try_new_unstalbe<D>((provider: &D, Option<LocaleData>& locale), add try_new_for_locale_unstable(...) (used by datetime), add try_new_with_option_unstable(...), or something?

makotokato avatar Aug 20 '24 15:08 makotokato

https://github.com/unicode-org/icu4x/issues/3284#issuecomment-2116381172 explains about try_new_with_options_unstable

makotokato avatar Aug 21 '24 00:08 makotokato