shortbread-docs icon indicating copy to clipboard operation
shortbread-docs copied to clipboard

Multilinguality

Open pnorman opened this issue 1 year ago • 11 comments

Multilinguality is an important goal and we should support it.

The use cases I see for multi-lingual names are

  • displaying a label in the user's language
  • displaying a label in a language the user is more likely to understand than the local language (e.g. English is better than Chinese for someone who only understands Latin scripts)
  • adding glosses (e.g. "Aachen (Aix-la-Chapelle)") to labels

It will be up to the tileset authors which languages they want to support. This will gracefully degrade because not supporting a language is the same as that language not being on an object.

We do not want to include all possible languages on every feature - for example, a tileset with English and German would include 会津若松市 as the name and Aizuwakamatsu as the English name with nothing as the German name, even though Germans would generally prefer the English name over the Japanese one.

pnorman avatar Nov 15 '24 21:11 pnorman

My preferred approach is to extend what we're doing with English and German to other languages. Have name_<code> attributes for each language on an object that the tileset supports.

pnorman avatar Nov 15 '24 21:11 pnorman

For what it's worth, this is also what's currently stopping me from switching to Shortbread for the OpenInfraMap basemap, as I select label languages based on browser language. (I could tweak the schema myself, but I haven't quite had the energy to do that yet.)

I think the tricky part is deciding which languages to support, because we probably don't want to include all of them...

russss avatar Nov 19 '24 13:11 russss

It will be up to the tileset authors which languages they want to support. This will gracefully degrade because not supporting a language is the same as that language not being on an object.

This is more or less the same approach as in the OpenMapTiles ecosystem. Although that schema does specify a set of languages, in practice, tools like Planetiler accept an override list that tile hosts expand considerably. For example, osmus/tileservice#24 exposes literally any name:*=* subkey that conforms to BCP 47.

Avoiding a specified list of languages also enables a tile provider to optionally resolve the name field to a user-requested language on demand instead of sending all the name fields to the client to resolve. This requires some extra middleware that can complicate a server-side setup, so the schema shouldn’t effectively require it.

I think the tricky part is deciding which languages to support, because we probably don't want to include all of them...

This would be up to you, but in my experience, there isn’t really much downside to just including all of them. Tile sizes don’t really increase much, because the localized names mostly only occur on place points, which aren’t terribly abundant. Even in officially multilingual countries, roads and buildings rarely have more than two languages on them, while POI names typically remain untranslated.

1ec5 avatar Jul 26 '25 20:07 1ec5

My proposal for voting is implement the above. Where features have names, a "name" attribute is required if there is a name tag and "name_xx" attributes are optional, where "xx" is an IETF language tag.

pnorman avatar Sep 12 '25 00:09 pnorman

I am very much in favour of this, as it will ensure that the maps can be used worldwide. Do you have any concerns, @joto?

MichaelKreil avatar Oct 23 '25 21:10 MichaelKreil

As I see it there are two different use cases for multilingual maps. One is where the user can switch between languages, one is where the map maker decides which language(s) to use. For the first one you need to switch in the viewer between different tags, for the second its much easier to do this server side, you'll have smaller tiles and simpler styles. We need something that supports both use cases.

In both cases there isn't a simple relationship from the *name* tags to the labels. Creating good labels from tags is difficult, you have to take into accounts where the label is in the world, what scripts a user will likely be able to read and so on. And there isn't only a single label in the general case, but possibly multiple labels, so that you can have, say, the local name and below it the English name in italics or so.

I think Shortbread should allow for the different use cases in some form and not force a simplistic name:XX tags becomes property name_XX.

joto avatar Oct 24 '25 13:10 joto

Shortbread currently supports name, name_de and name_en, which works perfectly - except that 94% of people do not have German or English as their first language. So it is very crucial to support more languages! Could you provide a specific example of where allowing additional name_?? attributes would cause an issue?

MichaelKreil avatar Oct 24 '25 14:10 MichaelKreil

Could you provide a specific example of where allowing additional name_?? attributes would cause an issue?

I am not saying this would cause an issue, I am saying the whole way this subject is being approached is simplistic and we should come up with something better.

joto avatar Oct 24 '25 15:10 joto

I think the proposed solution is both simple and effective. I can't think of a better approach. We should also start supporting multiple languages as soon as possible, since offering only German and English for a project that is supposed to be worldwide is really embarrassing.

MichaelKreil avatar Oct 24 '25 18:10 MichaelKreil

After some more discussions elsewhere, here is an idea:

  • The current situation looks like we are only supporting German and English, but that never made any sense. It was clearly meant that these are just two examples and everybody who implemented Shortbread and needed other languages could just extend that to more languages. Let's not make a big deal out of this and treat it like the badly written spec that it is: You can clearly make any name_XX from any name:XX, lets make it clear that that's inside the spec.
  • We still should think about better solutions for version 2.

joto avatar Oct 24 '25 18:10 joto

As I see it there are two different use cases for multilingual maps. One is where the user can switch between languages, one is where the map maker decides which language(s) to use. For the first one you need to switch in the viewer between different tags, for the second its much easier to do this server side, you'll have smaller tiles and simpler styles. We need something that supports both use cases.

These days, the scenario in highest demand is for a map that chooses a language based on the user’s preference, but not necessarily switching dynamically. Stuffing all the names into the tiles is one approach for enabling this use case. That should be fine for most implementers. Another approach is to rely on middleware to filter out irrelevant languages based on the request parameters. This results in slightly smaller tiles and avoids the need for dynamic “runtime styling” code on the client side, but it requires additional server infrastructure beyond a simple HTTP server or PMTiles file.

It was clearly meant that these are just two examples and everybody who implemented Shortbread and needed other languages could just extend that to more languages.

This sounds great. I’d suggest imposing as few constraints as possible on IETF language tags – the naming convention for name:*=* subkeys. Implementers should be allowed to output any valid BCP 47 code, no matter how obscure, even if other implementations would omit it. OpenMapTiles similarly only documents English and German as examples, but these days most major tile providers include plenty more languages. In fact, Planetiler-powered OMT deployments, such as the OSMUS Tileserver, expose literally any valid localized name subkey, as long as it’s tagged explicitly. This absolves clients from having to hard-code their own lists of supported languages, as we saw in https://github.com/openstreetmap/openstreetmap-website/pull/4042#discussion_r1216877551. (The OSM website is still relying on a hard-coded list maintained by MapTiler outside of source control.)

As an aside, I’ve also been looking into the requirements for making MapLibre-powered maps more screenreader-accessible. A minor part of that is ensuring that any phonetic transcriptions in OSM make it to the client side to override default TTS engine behavior. In OSM, the most common tagging scheme for phonetic transcriptions is name:pronunciation=*, while an alternative name:*-fonipa=* scheme is more interoperable because it’s based on an industry standard. As long as the Shortbread spec allows any valid BCP 47 code to follow name:*, then implementers will have the ability to expose name:*-fonipa properties as a matter of course, and map–screenreader bridges can simply look for those properties.

/ref openmaptiles/openmaptiles#1769

1ec5 avatar Oct 27 '25 02:10 1ec5