icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Investigate use cases for en-US-posix collation data adjust datagen ergonomics accordingly

Open hsivonen opened this issue 8 months ago • 0 comments

It appears that the purpose of the en-US-posix locale is to represent the C/POSIX locale in the BCP47 value space.

CLDR has a special collation tailoring for en_US_POSIX.

Firefox wants the C locale to behave like en-US, so the tailoring is unwanted. I expect command-line utilities to want to use the C locale to opt into lexical sort on bytes without calling into a real collator, so the tailoring would be unnecessary data.

We should investigate use cases for the tailoring an adjust datagen accordingly. (To exclude it by default or to make it super-easy to exclude.)

hsivonen avatar Apr 30 '25 10:04 hsivonen