icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Consider support of old locale extension syntax in locale parsing

Open alp opened this issue 1 year ago • 3 comments

We encounter clients that report locales using "Old locale extension syntax", e.g. ar@numbers=latn. I verified that locid::Locale parsing is rejecting such strings with the error "The given language subtag is invalid". Should we consider extending parsing support towards this syntax?

alp avatar Jan 31 '24 17:01 alp

We've previously discussed the degree to which we want to allow lenient parsing of locales. There are several toggles that could go into leniency:

  1. Whether to accept incorrect case: en-us
  2. Whether to accept underscores: en_US
  3. Whether to accept UTS 35 Old Locale Extension Syntax: ar@numbers=latn

I think what we want is a more full-featured locale parsing API. It could be data-driven and live alongside (or within) the LocaleCanonicalizer type.

It's worth noting that ECMA-402 permits case 1 but not 2 or 3.

Some previous related discussion in https://github.com/unicode-org/icu4x/issues/3336, https://github.com/unicode-org/icu4x/issues/1709

CC @zbraniecki

sffc avatar Jan 31 '24 17:01 sffc

I'm against supporting that as part of icu_locid.

This can be introduced as a standalone conversion library outside of ICU4X.

zbraniecki avatar Jan 31 '24 18:01 zbraniecki

Discuss with:

  • @zbraniecki
  • @sffc

Optional:

  • @echeran

sffc avatar Feb 01 '24 18:02 sffc