planetiler icon indicating copy to clipboard operation
planetiler copied to clipboard

[BUG] Unexpected behaviour of language config

Open millionpoundhat opened this issue 1 month ago • 1 comments

Describe the bug

When configuring languages in a .properties file a user can get unexpected behavior from wikidata translations by adding capitalization to hyphenated country codes e.g. zh-Hans vs zh-hans. The country codes are passed through to the SPARQL query in wikidata.java with capitalization which leads to them not matching anything in wiki data. When looking up country codes online it's common to find hyphenated examples that are capitalised.

To Reproduce Try running the SPARQL query with some values against the wikidata query service:


 SELECT ?label ?lang WHERE {
    # For specific Wikidata entities
    VALUES ?id { wd:Q1726 }

    # Get the labels (names) in various languages
    ?id rdfs:label ?label .

    # Only fetch labels in the languages we care about
    FILTER(LANG(?label) IN ("zh-Hans", "zh-Hant"))

    # Extract the language code from the label
    BIND(LANG(?label) as ?lang)
  }

vs


 SELECT ?label ?lang WHERE {
    # For specific Wikidata entities
    VALUES ?id { wd:Q1726 }

    # Get the labels (names) in various languages
    ?id rdfs:label ?label .

    # Only fetch labels in the languages we care about
    FILTER(LANG(?label) IN ("zh-hans", "zh-hant"))

    # Extract the language code from the label
    BIND(LANG(?label) as ?lang)
  }

As the case of the config languages in the config list also sets the attribute in the tileset, I would simply propose that we add toLowerCase() when formatting the SPARQL query in wikidata.java. I'm happy to open a PR for this.

millionpoundhat avatar Dec 10 '25 11:12 millionpoundhat

Good catch! I think automatically lower-casing these language tags should be fine, @1ec5 do you see any issue with this? If so then feel free to open a PR for this.

msbarry avatar Dec 12 '25 11:12 msbarry

Yes, I think lowercasing the language tag should be safe. Mixed case is strongly recommended by ISO and IETF, but technically everything is supposed to also handle them case-insensitively. I think Wikidata lowercases the codes for consistency with their subdomains. It seems like there’s little appetite for moving Wikidata to mixed-case language codes, but you could bake the LCASE() into the query to be extra safe.

1ec5 avatar Dec 18 '25 05:12 1ec5