icu4x Consider support of old locale extension syntax in locale parsing

Consider support of old locale extension syntax in locale parsing

Open alp opened this issue 1 year ago • 3 comments

We encounter clients that report locales using "Old locale extension syntax", e.g. ar@numbers=latn. I verified that locid::Locale parsing is rejecting such strings with the error "The given language subtag is invalid". Should we consider extending parsing support towards this syntax?

Jan 31 '24 17:01 alp

We've previously discussed the degree to which we want to allow lenient parsing of locales. There are several toggles that could go into leniency:

Whether to accept incorrect case: en-us
Whether to accept underscores: en_US
Whether to accept UTS 35 Old Locale Extension Syntax: ar@numbers=latn

I think what we want is a more full-featured locale parsing API. It could be data-driven and live alongside (or within) the LocaleCanonicalizer type.

It's worth noting that ECMA-402 permits case 1 but not 2 or 3.

Some previous related discussion in https://github.com/unicode-org/icu4x/issues/3336, https://github.com/unicode-org/icu4x/issues/1709

CC @zbraniecki

Jan 31 '24 17:01 sffc

I'm against supporting that as part of icu_locid.

This can be introduced as a standalone conversion library outside of ICU4X.

Jan 31 '24 18:01 zbraniecki

Discuss with:

@zbraniecki
@sffc

Optional:

@echeran

Feb 01 '24 18:02 sffc

icu4x icu4x copied to clipboard

Consider support of old locale extension syntax in locale parsing

icu4x
icu4x copied to clipboard