id-tagging-schema icon indicating copy to clipboard operation
id-tagging-schema copied to clipboard

Consider adding a dedicated "en-US" locale & mark "en" as international English

Open tyrasd opened this issue 2 years ago • 16 comments

Currently, the strings in the en locale play a special double role: On the one hand, they are the basis for all other translations and need to work for people who have set their computers to "generic" English everywhere on the planet. On the other hand it also needs to work as the locale for users who explicitly have set American English (en-US) as their language.

Usually, this is not a problem because most differences are relatively minor (e.g. spelling of Color vs. Colour) and people using the generic en don't normally care about that. But there are also edge cases where it doesn't work so well. For example, see https://github.com/openstreetmap/id-tagging-schema/issues/287#issuecomment-1014929030 or https://github.com/openstreetmap/id-tagging-schema/pull/288#issuecomment-985517274.

I think we can improve the situation by introducing a dedicated en-US translation on transifex and declare the en locale to be a generic "international" English. That way Americans can get dedicated texts (just like en-UK provides such an optimized user experience for British English speakers). While at the same time, international users and translators would get more broadly applicable labels.

tyrasd avatar Jan 18 '22 09:01 tyrasd

What is "international English"? I was under the impression that en-US in the international English, at least on the internet.

westnordost avatar Jan 18 '22 13:01 westnordost

I think changing the way translation has been done for almost 10 years just so you can have a better name for a "Layby" preset is the wrong thing to do.

bhousel avatar Jan 18 '22 14:01 bhousel

By the way regarding layby, StreetComplete translated "de:Parkbuchten" to "[Parking] adjacent to roadway only in dedicated spaces" because in en-US there is no good short word for it (IIRC). (And even in German, you'll often read "Parken auf dem Seitenstreifen" but you are still supposed to tag street_side)

westnordost avatar Jan 18 '22 14:01 westnordost

Related issue with the so-called data items (in the wiki): British vs American / US English

Hufkratzer avatar Jan 18 '22 14:01 Hufkratzer

We could even just make it so the parking combo field that is local to UK includes the layby option, and the parking combo field outside the UK doesn't include the option.

bhousel avatar Jan 18 '22 14:01 bhousel

What is "international English"?

Sorry for being a bit fuzzy on this one. My idea is still not 100% concrete in my head, but I was thinking about something a bit like https://simple.wikipedia.org: Something which is more accessible to a broader spectrum of people (including non native speakers), and less restricted to a specific single dialect of the language. This language version would be free to include more broadly known terms, disambiguations, etc. to make the presets as clear as possible. As an example, let's say we want to make the keys of the drink:* tags translatable. Then drink:soft_drink could be labeled as Soft Drink in "international" en English, while the en-US American English version would be able to use Soda Pop for this tag.

Btw: A similar solution was also proposed by @zekefarwell in https://github.com/openstreetmap/id-tagging-schema/pull/288#issuecomment-986055839.

tyrasd avatar Jan 18 '22 15:01 tyrasd

Then drink:soft_drink could be labeled as Soft Drink in "international" en English, while the en-US American English version would be able to use Soda Pop for this tag.

No, the label "Soft Drink" would be perfectly understood by everyone in en-US, and "Soda Pop" is a more regional phrase.

bhousel avatar Jan 18 '22 15:01 bhousel

To be clear: the soft_drink example was just hypothetical example to illustrate the point. A more practical example would be to use Freeway / Interstate for highway=motorway.

tyrasd avatar Jan 18 '22 17:01 tyrasd

@tyrasd I see, I get the idea, sounds good.

So, in theory there would be one large en translation file (which is incidentally the default translation) and a few tiny en-US, en-CA, en-GB translation files which just include the differences from the "international/simple English" version.

The practicality of this however hinges on whether transifex supports this, i.e.

  • it is clear for translators that they can and should leave the translation empty if no translation is necessary
  • and/or on exporting, the resulting en-US.json etc. are either merged together from translations from en-US and en or the application (iD) is able to merge these together, i.e. first looks if a translation is available in en-US and if not, takes the translation from en

Same really applies to es and pt which varies at least as much as variants of English.

For StreetComplete, this is sadly not possible. POEditor (the translation platform used by the app) does not work like this / has this feature. So, the default for StreetComplete is en-US and en-GB translators copy over most of the strings but change a few.

westnordost avatar Jan 18 '22 18:01 westnordost

I think this may be a good idea, but there are two issues that would need to be resolved for it to work.

  1. As I mentioned in #288 (comment), it seems that some operating systems and web browsers don't offer English (International) as an option when choosing a language**. Here are the English variants I'm seeing on Mac OS for example: enlish-variants So to offer an International English variant for all browsers, it seems like iD might need it's own language picker UI.
  2. International English is not an actual English language variant that anybody speaks. It is rather a vaguely defined, general concept. A translator can't determine if a term or phrase is correct International English like they can with British English, American English, or Australian English. With this example we can look in a dictionary to see that "lay-by" is the British English term and "turnout" is the American English term, but there is nowhere to lookup what the International English term would be. A controlled subset like Basic English might work, but I imagine the limited word list would fail to cover the huge variety of geographic concepts needed for OSM. Neither "lay-by" or "turnout" is in the Basic English word list, so in Basic English you'd have write something more verbose like "place for automobiles to pull over".

**Edit: My mistake, Chrome, Edge, and Firefox do offer a plain "English" option in addition to the local variants. Safari does not seem to have this option. chrome-english-options

zekefarwell avatar Jan 18 '22 21:01 zekefarwell

Currently, the strings in the en locale play a special double role: On the one hand, they are the basis for all other translations and need to work for people who have set their computers to "generic" English everywhere on the planet. On the other hand it also needs to work as the locale for users who explicitly have set American English (en-US) as their language.

First of all, I share your frustration about this situation every time I have to be “that guy” in pointing out that something doesn’t jive with American English. I do so in order to improve the mapper experience among Americans, but my options are limited. Hopefully it doesn’t come across as nationalism or an attempt to impose a dialect upon other linguistic communities.

Usually, this is not a problem because most differences are relatively minor (e.g. spelling of Color vs. Colour) and people using the generic en don't normally care about that. But there are also edge cases where it doesn't work so well. For example, see #287 (comment) or #288 (comment).

I support creating an American English localization solely for terms familiar to American English speakers that would be bizarre to other English speakers, similar to why we’ve created the other regional English localizations. American English speakers would most likely end up getting the right mix of strings from the main English localization with the occasional string overridden by the American English localization.[^roundabout] I see that @zekefarwell has already requested the localization in Transifex, so all that remains is for a project administrator to approve it and add folks as translators. Maybe “corn” can be the first override: https://github.com/openstreetmap/id-tagging-schema/pull/257#discussion_r738106652.

That said, I don’t think creating an American English localization for overrides solves any of the broader problems completely. In software development, there is inherently a development locale and thus a need to choose a locale that’s commonly understood and usable on its own. Many users around the world use software in English despite not residing in a country that uses English as a national language. For example, I recently found that the vast majority of mappers in Vietnam prefer to map in English. These users will likely get the main English localization and have as much right to a clean, intuitive interface as anyone else.

I think we can improve the situation by introducing a dedicated en-US translation on transifex and declare the en locale to be a generic "international" English. That way Americans can get dedicated texts (just like en-UK provides such an optimized user experience for British English speakers). While at the same time, international users and translators would get more broadly applicable labels.

Wouldn’t this “generic international” English end up being the status quo? The tricky situations are about vocabulary, not spelling, but when vocabulary differs between regional dialects, less is more. In https://github.com/openstreetmap/id-tagging-schema/issues/287#issuecomment-1014929030, I pointed out that accompanying “turnout” with “lay-by” would decrease understanding of the preset among all dialects. In an attempt to include everyone, it would underserve everyone. I didn’t want to put too fine a point of it, but “lay-by” is limited to British English and maybe Irish English. In Australia, New Zealand, and South Africa, it means a layaway. A mapper might logically conclude that they’d only use this option for the parking at the side of a department store beside the layaway. Meanwhile, in American and Canadian English, it just sounds inscrutable and odd. This kind of thing happens all the time in English, because it’s such a decentralized, unregulated language. We should develop this project in Esperanto!

It would be technically feasible to change the development language to British English, to superficially align with raw tag spelling conventions, if we think American English is such an outlier among English dialects when it comes to preset terminology. However, I’m not sure that’s the case: the British English localization has needed to override 32% of strings, compared to the much smaller rate of overrides in the Australian English (0.81%) and New Zealand English (1.17%) localizations. If someone were to approve the Canadian English localization, I’m pretty sure it would have even more similarities with American English. Obviously this is a crude metric, since the British English localization team is much larger. But if American English were such an outlier among English dialects, I’d imagine we’d at least see a little more uptake of these regional English localizations in Transifex. Or do Australians typically set British English as a fallback language?

In some cases, there may be even more lost in translation than there is today by changing the development language. My understanding is that most software translators on platforms such as Transifex are accustomed to translating from an American English locale, due to choices historically made by software developers. If iD chooses a less common development locale, they may interpret the source strings differently than expected.

Sorry for being a bit fuzzy on this one. My idea is still not 100% concrete in my head, but I was thinking about something a bit like https://simple.wikipedia.org: Something which is more accessible to a broader spectrum of people (including non native speakers), and less restricted to a specific single dialect of the language.

As an early enthusiastic contributor to the Simple English Wikipedia, I would suggest that maintaining a localization in Simple English or Basic English would make OSM’s tagging debates look like an uncontroversial, automated affair by comparison. Considering the nuance of so many OSM tags and their presets, rendering them in Simple English would increase confusion among translators and mappers alike. I know this challenge well, having struggled to translate often highly Eurocentric tagging distinctions into Vietnamese. I have the luxury of leaving untranslated anything I’m unsure about, but that isn’t an option for the main development language.

This language version would be free to include more broadly known terms, disambiguations, etc. to make the presets as clear as possible.

We shouldn’t treat this localization as a scratchpad for translator notes, because it affects the real UI presented to end users who need clarity, not a thesaurus. It would be more effective to allow presets to specify more precise translator comments that Transifex would expose in the right context, along the lines of ideditor/schema-builder#27.

Same really applies to es and pt which varies at least as much as variants of English.

This is a good point: the difficulty of maintaining a region-agnostic localization is not unique to the development language. Historically, for convenience, the software industry used the main Spanish (es) localization to present generic Latin American Spanish and a separate Castillian Spanish (es-ES) localization as an override. These days, the es-419 localization serves as a more specific Latin American locale code, but not every operating system uses it, so a generic localization is still required. In projects I’ve maintained, I’ve seen so much back and forth over the Latin American Spanish localization, because every country has markedly different vocabulary and sometimes humorous conflicts in usage.

[^roundabout]: Incidentally, I nearly took this approach in OSRM due to the word “roundabout”, which some Americans still refer to by the less evocative term “traffic circle”. Thankfully, Americans are gradually learning to say “roundabout”: Project-OSRM/osrm-text-instructions#188.

1ec5 avatar Jan 19 '22 00:01 1ec5

it seems that operating systems and web browsers don't offer English (International) as an option when choosing a language.

This seems to depend on the OS and/or browser. Because in Firefox (Linux) one can choose a generic English:

Adding a setting in iD to override the browser language would be a solution for this of course, and could also be convenient also in other use cases.

tyrasd avatar Jan 19 '22 09:01 tyrasd

Adding a setting in iD to override the browser language would be a solution for this of course, and could also be convenient also in other use cases.

Can you elaborate on what would be solved by making a generic English option available via a custom language switcher? It’s already the case that users will automatically see strings from the development language (en) if their browser sends an unsupported English locale such as Canadian English (en-CA), an intentionally incomplete English locale such as Australian English (en-AU), or an incomplete non-English locale, or if the user explicitly sets the locale URL parameter to en. So regardless, we would have to keep the default localization presentable to end users for these situations, which brings us back to the problem posed in the original post.

1ec5 avatar Jan 19 '22 17:01 1ec5

This seems to depend on the OS and/or browser. Because in Firefox (Linux) one can choose a generic English

Oops, I was looking at the "Firefox Language Settings" (for Firefox UI only I guess) which has no generic English option. Now that I'm looking in the "Webpage Language Settings" I see generic English on Firefox (Mac) as well. So it seems only Safari doesn't offer the generic English option.

Screen Shot 2022-01-19 at 10 43 12 PM

Can you elaborate on what would be solved by making a generic English option available via a custom language switcher?

A Chrome, Edge, or Firefox user can set their browser to use generic English, and iD would respect that. It seems that a Safari user can only choose localized English variants so they wouldn't be able to use iD in generic English without a custom language switcher. Seems like much less of an issue now that I realize it isn't a problem in Chrome, Edge, or Firefox.

zekefarwell avatar Jan 20 '22 04:01 zekefarwell

Can you elaborate on what would be solved by making a generic English option available via a custom language switcher? […] or if the user explicitly sets the locale URL parameter to en.

At the very least it would be a UI for the existing URL parameter, i.e. it would be easier to switch to a language different from your OS/browser's default settings. (e.g. when one prefers to map with en presets, but have you OS set up in any other language).

tyrasd avatar Jan 20 '22 13:01 tyrasd

I guess I don’t see the relationship between such a UI (which would be convenient to be sure) and the original problem statement. Wouldn’t the en locale still do double duty as both a user-facing locale and as an intermediate locale for translators? Or have we moved past the idea of maintaining English dialect neutrality?

1ec5 avatar Jan 22 '22 08:01 1ec5

Related issue with the so-called data items (in the wiki): British vs American / US English

There seems to be an issue in the wiki’s configuration that’s breaking the en-us locale code in Wikibase, making it harder to access for most users. Unless we can get to the bottom of the issue, this probably means en should be used as the code for American English there, not British English (which already has a working en-gb) or an unspecified English dialect, at least when it comes to documentation lookups.

1ec5 avatar May 25 '23 04:05 1ec5

breaking the en-us locale code in Wikibase

Not sure if I understand the problem correctly. At first glance, it seems to me that requesting en-us locale does return the correct American English description of the tag:

image

So, you mean that there is an issue with the user interface of the wiki and because of that the wikibase en should contain American English content instead of British English descriptions? I'd say that's something for the maintainers of the wiki to decide. :wink:

tyrasd avatar May 25 '23 18:05 tyrasd

I don’t think there’s anything actionable for iD here. I was responding to an earlier question about whether iD should assume the en labels are in British English or something other than American English, which seems impractical.

/ref https://github.com/openstreetmap/id-tagging-schema/issues/357#issuecomment-1015465856

1ec5 avatar May 25 '23 21:05 1ec5