planetiler icon indicating copy to clipboard operation
planetiler copied to clipboard

Accept Unicode extensions in localized name subkeys

Open 1ec5 opened this issue 11 months ago • 1 comments

The code for validating name:*=* keys rejects extension syntaxes that are allowed under BCP 47.

For example, this peak in Hong Kong has name:en-t-zh=* set to an English translation of a Chinese name. While a user is unlikely to specifically ask for a map labeled by mode of derivation, they might be interested in labels that take one side or another in a geopolitical dispute. This lake whose name is disputed between two U.S. states. I’ve tagged each state’s recognized name in name:en-u-sd-usnc=* and name:en-u-sd-usva=*, using Unicode subdivision identifiers. This syntax is also being suggested for a more prominent dispute in current events.

These keys fail the test because it doesn’t account for the one-letter -t- or -u- extension marker:

https://github.com/onthegomap/planetiler/blob/4ecb02d136fca611b920abb17e8d60e66f21d8d2/planetiler-core/src/main/java/com/onthegomap/planetiler/util/LanguageUtils.java#L13

Names in Gallo are tagged name:fr-x-gallo=* because no ISO 639 code has been assigned yet. This does pass because we’re specifically looking for -x-.

Instead of continuing to patch this homegrown regular expression, we could look around for more comprehensive ones such as this JavaScript implementation.

1ec5 avatar Jan 29 '25 13:01 1ec5

Per discussion from #1184 let's also include breaking up this regex so it just validates the language subkey so it can be used with other name prefixes as well.

msbarry avatar Feb 24 '25 11:02 msbarry

Names in Gallo are tagged name:fr-x-gallo=* because no ISO 639 code has been assigned yet. This does pass because we’re specifically looking for -x-.

Actually there is a language subtag for Gallo. I’ve proposed to retag name:fr-x-gallo=* as name:fr-gallo=*, which Planetiler would already support. But the point stands that private extensions are worth supporting anyways.

1ec5 avatar Nov 23 '25 23:11 1ec5

Instead of maintaining our own validator, we could try to get Locale.forLanguageTag() and see if it throws an NPE. If performance is a concern, we could cache the result for a given name:*=* subkey in a hash map.

1ec5 avatar Nov 24 '25 04:11 1ec5