analytics icon indicating copy to clipboard operation
analytics copied to clipboard

Inconsistently localized region names

Open metmarkosaric opened this issue 2 years ago • 4 comments

Past Issues Searched

  • [X] I have searched open and closed issues to make sure that the bug has not yet been reported

Issue is a Bug Report

  • [X] This is a bug report and not a feature request, nor asking for self-hosted support

Describe the bug

One thing I have noticed is that when looking at region names in Norway, it seems that the Sámi (unsure of language code -- sami=smi or maybe more specific south sami=sma? ) translation is used for a few regions instead of bokmål (language code nb) which I would expect.

(reported via email)

Expected behavior

Trööndelage => Trøndelag, Romssa ja Finnmárkku => Troms og Finnmark

If there are more examples, please do add them in this thread

Screenshots

No response

Environment

- OS:
- Browser:
- Browser Version:

metmarkosaric avatar Mar 08 '22 11:03 metmarkosaric

The (at least 3) different Sami languages don't have names for most (non-sami) places.

"Tröondelage" seems to be southern sami, which is still an official language qualifying for "in Norway". I could only find the language code for northern sami http://i18n.skolelinux.no/localekoder.txt whereas if (in all likelihood) the "in Norway" user wants Norwegian Bokmål, it should be "nb_NO".

It could be you just selected something else (?), and it all depends on what the user selects.

comradekingu avatar Apr 09 '22 22:04 comradekingu

We've got a similar report from Finland where local regions are displayed in the Swedish language. Example here:

Swedish (how we display): Norra Österbotten Finish: Pohjois-Pohjanmaa English: North Ostrobothnia

metmarkosaric avatar May 23 '22 13:05 metmarkosaric

Kind of related, I'm adding this here because it's the only open issue to do with location names. There's a suggestion that a suburb of London should be categorized under London which we should do as well when we address this issue: https://github.com/plausible/analytics/issues/1909

ukutaht avatar May 25 '22 08:05 ukutaht

CLDR can be used to map language and territories but it doesn't help much with sub-national divisions. https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/territory_language_information.html

We've got a similar report from Finland where local regions are displayed in the Swedish language.

That's not necessarily wrong. In Finland most municipalities are bilingual (usually with some combination of Finnish, Swedish, Sami), but some municipalities are unilingually Finnish or Swedish. Maybe some of the existing work by others in OpenStreetMap and Wikidata can be reused? https://blog.mapbox.com/exploring-the-world-with-wikidata-and-openstreetmap-30f1bfe954d3 https://blog.mapbox.com/support-for-arabic-and-portuguese-in-mapbox-streets-5a9690dabff4

nemobis avatar Jul 01 '22 20:07 nemobis

We did some more testing of different geolocation databases, and we have gone live with a switch of the database provider.

We're now using the MaxMind database.

No database is 100% accurate unfortunately, but this change should provide some more accuracy in terms of cities.

Details: https://plausible.io/docs/countries#how-it-works

bogplau avatar Jan 18 '23 11:01 bogplau