icu4x
icu4x copied to clipboard
Investigate use cases for en-US-posix collation data adjust datagen ergonomics accordingly
It appears that the purpose of the en-US-posix locale is to represent the C/POSIX locale in the BCP47 value space.
CLDR has a special collation tailoring for en_US_POSIX.
Firefox wants the C locale to behave like en-US, so the tailoring is unwanted. I expect command-line utilities to want to use the C locale to opt into lexical sort on bytes without calling into a real collator, so the tailoring would be unnecessary data.
We should investigate use cases for the tailoring an adjust datagen accordingly. (To exclude it by default or to make it super-easy to exclude.)