message-format-wg icon indicating copy to clipboard operation
message-format-wg copied to clipboard

Constraints on @locale values

Open aphillips opened this issue 2 years ago • 6 comments

In #450 we are considering the addition of a contextual expression attribute for the locale. Currently this says that the values must be a BCP47 language tag (or comma-separated sequence of tags, e.g. a language priority list). We should agree on the level of validation for conformance.

Generally speaking, we should probably require well-formed tags (according to BCP47's understanding of well-formed), at least at the syntax level. We could require that the tags be valid (this requires checking that the subtags are in the registry) We could require that the tags be valid Unicode Locale Identifiers (ULIs) (which has further canonicalization requirements)

Writing this issue to provide a separate place to discuss.

aphillips avatar Aug 15 '23 14:08 aphillips

I think we should instead use https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#Unicode_locale_identifier.

macchiati avatar Aug 23 '23 17:08 macchiati

@macchiati ULIs are one of the options listed. I do note that the requirements on ULIs have the disadvantage of not being strictly linked to the BCP47 grammar, that is, it is necessary to process the tag to know if it conforms. Implementations may not all have the ability to do the necessary validation and canonicalization. Perhaps we could say something like:

The value of a locale attribute MUST be a sequence of well-formed BCP47 language tags. Each language tag SHOULD be a valid Unicode Locale Identifier (ULI)

aphillips avatar Aug 27 '23 17:08 aphillips

I'm not completely certain how to express it, but in practice the JS implementation will almost certainly apply this validation to language tags as that's what we use for the formatter's locales argument.

So any spec text which allows for that practice to continue would be good.

eemeli avatar Aug 27 '23 18:08 eemeli

@eemeli That validation is ULI (although it's probably not a good thing that it is separate from Unicode's definition, given that one or the other might change...)

I'm hesitant to require every implementation everywhere to use ULI simply because some implementations won't support the various extensions and such--but can still use the language tag bits.

Many specifications "require" users to be more strict than what implementations are required to enforce. The way to say that would be:

The value of a locale MUST be a sequence of valid Unicode Locale Identifiers. Implementations are only required to validate that each ULI language tag is a well-formed BCP47 language tag.

aphillips avatar Aug 27 '23 18:08 aphillips

Depends on expression attributes being accepted (#450)

aphillips avatar Jan 08 '24 15:01 aphillips

We decided to make expression attributes reserved for future standardization. As a result, this item is out of scope for LDML45.

aphillips avatar Jan 19 '24 23:01 aphillips

@locale was replaced by :u:locale so this is obsolete

aphillips avatar Sep 09 '24 16:09 aphillips