CLDR-17223 Use new menu attribute in territory display names
CLDR-17223
- [ ] This PR completes the ticket.
I'm doing just English to start. I already found an odd case: we have "Cocos (Keeling) Islands" which isn't easily constructible from a menu glue pattern.
Please give feedback and suggestions. I would like to land this change relatively quickly because it blocks ICU4X.
My preference would be to just drop the extensions:
English (Myanmar)notEnglish (Myanmar [Burma])orEnglish (Myanmar (Burma)). So I definitely favor the addition of menu core/extension values for Myanmar and Cocos Islands. Nonetheless, I'm happy to accept your solution since you are closer to the problem space.
So, in the specific case of Myanmar (Burma), maybe it's time to just drop the parenthetical. But, there are hundreds of other cases. I'm seeking to find the correct general solution here. Here are more examples:
Alternative names for the same territory:
- Falkland Islands (Islas Malvinas)
- Aotearoa (Nouvelle-Zélande)
- Congo (RDC)
Clarification about who owns a particular territory:
- Виргинские о-ва (США)
- Макао (САР)
So unless we want to use a pattern that avoids the parentheses entirely (which I'm open to exploring), we need to answer what happens with the nested parentheses.
I commented on an error you'll need to fix to run the CLDR modify script.
Fixed, thanks!
Btw I'm a bit confused, because in the test data it shows
en-MM; English (Myanmar [Burma])already -- before this change. I suspect there may be something hard-coded that is already doing the nested brackets.https://github.com/unicode-org/cldr/blob/main/common/testData/localeIdentifiers/localeDisplayName.txt#L923-L933
There's hacky code somewhere that does a string substitution for '(' to '['. UTS 35 says:
When the display name contains "(" or ")" characters (or full-width equivalents), replace them by "[", "]" (or full-width equivalents) before adding.
https://unicode.org/reports/tr35/tr35-general.html#locale_display_name_algorithm
I claim that this is terrible for both quality and implementability, and I want to improve it.
Make sure to update the display name documentation:
https://github.com/unicode-org/cldr/blob/main/docs/ldml/tr35-general.md?plain=1#L135-L163
Yep, I'll work on that once we have alignment on the approach.
If we consider the contents of the parenthetical to be "optional", another approach could be to include the parenthetical when formatting a region display name, but drop it when formatting a locale display name.
new Intl.DisplayNames("en", { type: "region" }).of("MM")
// => "Myanmar (Burma)"
new Intl.DisplayNames("en", { type: "language" }).of("en-MM")
// => "English (Myanmar)"
// NOT "English (Myanmar [Burma])" ?
We discussed this in CLDR Design WG and an alternative approach was preferred. #5240