icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Add BaseLanguageHandling option to Datagen

Open sffc opened this issue 1 year ago • 11 comments

See https://github.com/unicode-org/icu4x/issues/58

I implemented the new option as BaseLanguageHandling::Retain or BaseLanguageHandling::Strip.

sffc avatar Feb 13 '24 23:02 sffc

Current:

  • Runtime
  • RuntimeManual
  • Hybrid
  • Preresolved

Proposed to add:

  • RuntimeWithBaseLanguages
  • RuntimeManualWithBaseLanguages

  • @sffc - How to we turn off regional variants?
  • @robertbastian - Maybe something like en-XX to not include regional variants, this woulud be more granular because you could do de,en-ZZ for all de variants and no en variants
  • @sffc - I think it should be a variant, because it can also apply to regional variants like en-001. So you may want en-001-nochilds, which includes en and en-001 but not en-GB.

  • @robertbastian - Maybe we want something like
Deduplicated / Runtime2 {
  use_internal_fallback: bool,
  include_base_languages: bool,
}

Now:

#[non_exhaustive]
pub enum FallbackMode {
    #[default]
    DefaultForProvider,
    // same as Deduplicated { RuntimeFallbackLocation::Internal, BaseLanguageHandling::Strip }
    Runtime,
    // same as Deduplicated { RuntimeFallbackLocation::External, BaseLanguageHandling::Strip }
    RuntimeManual,
    Hybrid,
    Preresolved,
    Deduplicated(Deduplicated),
}

2.0:

#[non_exhaustive]
pub enum LocaleExpansionMode {
    Deduplicated(Deduplicated),
    Exhaustive(Exhaustive),
    Preresolved(Preresolved),
}

#[non_exhaustive]
pub enum RuntimeFallbackLocation {
    Internal,
    External,
}

#[non_exhaustive]
pub enum BaseLanguageHandling {
    #[default]
    Retain,
    Strip,
}

#[non_exhaustive]
#[derive(Default)]
pub struct Deduplicated {
  runtime_fallback_location: Option<RuntimeFallbackLocation>,
  base_language_handling: BaseLanguageHandling,
}

// resolve this against the exporter like
foo.runtime_fallback_location.unwrap_or_else(|| 
    if exporter.supports_runtime_fallback() {
        RuntimeFallbackLocation::Internal
    } else { 
        RuntimeFallbackLocation::External 
    }
)

#[non_exhaustive]
pub struct Exhaustive {}

#[non_exhaustive]
pub struct Preresolved {}

LGTM: @Manishearth @sffc @robertbastian

sffc avatar Feb 22 '24 17:02 sffc

@robertbastian What do you want the CLI to look like in 1.5 and in 2.0?

sffc avatar Feb 23 '24 02:02 sffc

I guess for now we flatten the enum like runtime-manual-retain and runtime-strip?

robertbastian avatar Feb 23 '24 08:02 robertbastian

If we are going to end up with a base_language_handling option in the Deduplicated enum, then maybe it makes sense that it should be its own flag on the CLI.

sffc avatar Feb 23 '24 21:02 sffc

It doesn't, because then someone will set base_language_handling for hybrid or preresolved and that doesn't make sense.

robertbastian avatar Feb 23 '24 22:02 robertbastian

We have a lot of flags that are only used if some other flag is set. This would be the same.

sffc avatar Feb 23 '24 22:02 sffc

@robertbastian I would like to merge this as-is because:

  1. The final 2.0 CLI option should indeed be named --base-language-handling because that is how we handle all other cases where flags are not used depending on the value of other flags. If you really want I could make --base-language-handling return an error if set with the wrong value for --fallback.
  2. In the API, we are changing it anyway in 2.0 as discussed above. The change proposed is the smallest change to the API. I don't want to implement only half of the 2.0 changes now. I would rather we, in another PR, implement LocaleExpansionMode as designed above. We could even do that in 1.5 and deprecate FallbackMode. In that PR, we could delete any unreleased APIs such as this one that are covered by LocaleExpansionMode.

sffc avatar Feb 24 '24 02:02 sffc

With my engine maintainer hat on, I have to agree with @sffc. Boa will require all engine embedders to use this feature, so from a docs perspective it's easier to guide our users to activate a simple flag than to explain to them which fallback variants are supported and how to activate them.

jedel1043 avatar Feb 24 '24 02:02 jedel1043

Need to re-do this on top of #4710

sffc avatar Apr 10 '24 07:04 sffc

Probably super seeded by #4836.

jedel1043 avatar Apr 25 '24 01:04 jedel1043

Probably super seeded by #4836.

Things this PR does that #4836 didn't do:

  • Added to CHANGELOG.md
  • Added "tlh-001" to the options test (to test an unsupported non-base locale)
  • Made retaining base languages the default option

sffc avatar Apr 25 '24 01:04 sffc