cldr icon indicating copy to clipboard operation
cldr copied to clipboard

CLDR-17223 Use new menu attribute in territory display names

Open sffc opened this issue 1 month ago • 2 comments

CLDR-17223

  • [ ] This PR completes the ticket.

I'm doing just English to start. I already found an odd case: we have "Cocos (Keeling) Islands" which isn't easily constructible from a menu glue pattern.

Please give feedback and suggestions. I would like to land this change relatively quickly because it blocks ICU4X.

sffc avatar Dec 09 '25 23:12 sffc

My preference would be to just drop the extensions: English (Myanmar) not English (Myanmar [Burma]) or English (Myanmar (Burma)). So I definitely favor the addition of menu core/extension values for Myanmar and Cocos Islands. Nonetheless, I'm happy to accept your solution since you are closer to the problem space.

So, in the specific case of Myanmar (Burma), maybe it's time to just drop the parenthetical. But, there are hundreds of other cases. I'm seeking to find the correct general solution here. Here are more examples:

Alternative names for the same territory:

  • Falkland Islands (Islas Malvinas)
  • Aotearoa (Nouvelle-Zélande)
  • Congo (RDC)

Clarification about who owns a particular territory:

  • Виргинские о-ва (США)
  • Макао (САР)

So unless we want to use a pattern that avoids the parentheses entirely (which I'm open to exploring), we need to answer what happens with the nested parentheses.

I commented on an error you'll need to fix to run the CLDR modify script.

Fixed, thanks!

Btw I'm a bit confused, because in the test data it shows en-MM; English (Myanmar [Burma]) already -- before this change. I suspect there may be something hard-coded that is already doing the nested brackets.

https://github.com/unicode-org/cldr/blob/main/common/testData/localeIdentifiers/localeDisplayName.txt#L923-L933

There's hacky code somewhere that does a string substitution for '(' to '['. UTS 35 says:

When the display name contains "(" or ")" characters (or full-width equivalents), replace them by "[", "]" (or full-width equivalents) before adding.

https://unicode.org/reports/tr35/tr35-general.html#locale_display_name_algorithm

I claim that this is terrible for both quality and implementability, and I want to improve it.

Make sure to update the display name documentation:

https://github.com/unicode-org/cldr/blob/main/docs/ldml/tr35-general.md?plain=1#L135-L163

Yep, I'll work on that once we have alignment on the approach.

sffc avatar Dec 11 '25 23:12 sffc

If we consider the contents of the parenthetical to be "optional", another approach could be to include the parenthetical when formatting a region display name, but drop it when formatting a locale display name.

new Intl.DisplayNames("en", { type: "region" }).of("MM")
// => "Myanmar (Burma)"

new Intl.DisplayNames("en", { type: "language" }).of("en-MM")
// => "English (Myanmar)"
//    NOT "English (Myanmar [Burma])" ?

sffc avatar Dec 11 '25 23:12 sffc

We discussed this in CLDR Design WG and an alternative approach was preferred. #5240

sffc avatar Dec 17 '25 01:12 sffc