pinafore icon indicating copy to clipboard operation
pinafore copied to clipboard

Support for non-English interface language (i18n)

Open Krinkle opened this issue 5 years ago • 21 comments

From the README / secondary goals section. Creating a ticket for it here so as to consolidate ideas and discussion prior to offering help implement this :)

Krinkle avatar Dec 31 '19 02:12 Krinkle

Some notes:

  • Processing logic. I imagine the main complication will be how to handle pluralisation across languages. E.g. in English one might get a way with "X new message(s)" but this gets significantly more complicated in languages where the noun varies based on whether it the number is zero, one, two, three, under 10, divisible by 11, or over 100, etc. For this my main experience is with using CLDRPluralRuleParser. Although Intl.PluralRules also exists now, which has reached Stage 4 for inclusion in ECMAScript 2020, with all major browsers implementing it as of a few months ago.

  • Message format. To substitute message parameters and handle pluralisation, we'll need to decide what syntax to use in the messages, e.g. the syntax which translators will read and write, and expose to the processing logic. For this my main experience has been with "Banana", a message format originally created for Wikipedia's backend software (MediaWiki/PHP), but now also usable through standalone JavaScript implementations (such as jquery.i18n). E.g. a message could look like 'Hello $1, you have $2 new {{plural:$2|message|messages}}.'.

  • Translations. To lower the barrier for users to contribute translations (not requiring use of GitHub or JSON etc.), we could integrate with translatewiki.net. See https://translatewiki.net/, https://translatewiki.net/wiki/Translating:New_project. This basically means they git-pull once a day to sync the English source messages, after which they automatically export any new translations that have been submitted and peer-reviewed by our translators there, via a pull request.

Krinkle avatar Dec 31 '19 02:12 Krinkle

@nolanlawson Is there something I can do as a next step toward this?

I'd also like to contribute a Dutch translation which in turn might make it easier to recommend Pinafore and Mastodon more generally toward my non-technical friends/family in the Netherlands.

Krinkle avatar Oct 13 '20 04:10 Krinkle

I think at this point it would be very difficult to get this working properly. Especially the fact that we're stuck on Svelte v2 means that there is no easy way to benefit from Svelte 3 and whatever i18n solutions that community may have come up with. (There was nothing for Svelte/Sapper in the Svelte 2 ecosystem at the time I first wrote Pinafore.)

One reason I was really hesitant to go down this road is that maintaining many translations is an arduous task, and I never felt that I would be able to handle the time commitment. That's kind of why I put it under the "possible" future goals.

OTOH if Pinafore were refactored such that all English strings were pulled out into a single JSON/JS file, and then simply imported into where they're needed, then I'd be fine with that. Then presumably people could run their own Pinafore servers and swap out the translation file, and long-term maybe we could figure out how to actually manage translations on the client or server side. (Server-side would be tricky since it's a static site.)

nolanlawson avatar Oct 17 '20 21:10 nolanlawson

@nolanlawson I don't remember building something user-facing in the past decade that wasn't localised. For me that means I'm very familiar with the "how", yet, on the other hand, very blind to the cost 🙂 Could you elaborate on the kinds of maintenance difficulties you'd like to avoid?

I'm open to anything, but my go-to approach would be to move the strings out, and have the application code call a function that's given a message key, with some optional parameters if needed, and get a string in return. E.g. "You have " + num + " message(s)" might become msg("new-messages", num).

Messages would default/fallback to English so day-to-day we wouldn't need to do much other than edit a i18n/en.json file. Depending on how we handle contributions for translations, you'd either leave it up to pull requests by others to improve/add missing translations, or let something like translatewiki.net do its thing (which would pull the repo daily for any new/changed source messages, and provide a PR with any added/changed translations).

There's indeed a trade-off to be made with regards to which languages to bundle, and how much to optimise ahead of time. Rambling out loud for a bit:

  • I suspect the best performance would be achieved by compiling/embed the translations into the components. This would produce a bundle almost the same as what we have today, hiding the indirection. This seems appealing, but I've not found this to work well in practice. It makes it harder to handle message variables I think, requires additional maintenance for the logic of doing that embedding, and would mean one has to create and manage a separate app bundle for each language, and the complexity for somehow switching between them.
  • The simplest approach (for the developer) would be to bundle all JSON translation files in the bundle, and access the messages at run-time. This might actually be feasible for Pinafore, if there are only a few dozen or hundred distinct messages, the bundle size wouldn't be very notably increased. In my experience the runtime cost of accessing an object key and performing some light string manipulation tends to be insignificant. But of course, we should measure this. Every app is differnet :)
  • If bundling all languages turns out to add too much to the initial download cost, we could bundle only English and fetch other JSON files on-demand. This would mean that, like the separate bundle approach, the user would need connectivity when switching languages. However, unlike with separate bundles, there would be no switching complexity, and it would only be upon the first time a language is switched to. After that you'd keep it in the cache bucket. And upon subsequent updates, you'd always include the current language's JSON file among the preloaded assets.

Having an optional file or directory with message overrides would be cool indeed. That would e.g. allow for themetic or fandom customisations whilst still making it easy to upgrade.

Krinkle avatar Oct 17 '20 22:10 Krinkle

I think the approach of one-JSON-file-per-language, setting English as default and loading other languages on-demand, would make the most sense. For some rare cases where we do SSR (e.g. the non-logged-in view), there would be a flash of English text, but I don't think this is a huge deal. Open question is whether we would figure out the language based on the browser's declared language preferences, or just let people configure it from the settings.

In terms of maintenance cost, I'm mostly thinking of the fact that whenever you add a new strings, you potentially have to add it in multiple languages. Or else the translations end up being 85% that language and 15% English fallbacks, which is not ideal.

nolanlawson avatar Oct 24 '20 16:10 nolanlawson

Hm, Vercel seems to have internationalization for Next.js, but nothing available for static sites. So yeah I think we would have to do client-side-only sadly.

nolanlawson avatar Nov 25 '20 17:11 nolanlawson

After thinking about it and reading this guide, here's what I think we should do:

1.Move current English strings to en.json 2. Use the built-in i18n API for plurals and such 3. Have a build step that turns it into multiple ES module files 4. Import those ES modules into the components. This will ensure we get perfect code-splitting, no big JSON file for the user to download 5. Build completely separate apps for different languages with separate origins, e.g. fr.pinafore.social, es.pinafore.social. Building it would be like e.g. LANG=fr yarn build.

This option is not ideal (the ideal would be pinafore.social/fr/<rest of route>) but because of how routing works in Sapper, that would be untenable (every route gets its own JS chunk; the chunks would explode). Also I don't even know if Sapper can do routing with the base route being anything other than /.

The separate-origin solution has the downside of every language being its own separate PWA with its separate Service Worker. But we could do client-side redirects to redirect e.g. French users to fr.pinafore.social, and use rel="alternate" so that search engines understand what's going on.

The other downside is that self-hosters will have to pick one language and deploy that app, or host multiple versions as pinafore.social will.

I'm not super happy with the multi-origin approach, but in any case steps 1-4 need to be done either way. I can get started on that and do a French translation since I speak some French.

nolanlawson avatar Nov 25 '20 20:11 nolanlawson

OK I'm starting to work on moving the English strings over in this branch: https://github.com/nolanlawson/pinafore/tree/intl

For Vercel we can deploy e.g. fr.pinafore.social and in the build script we can read the VERCEL_GIT_COMMIT_REF environment variable to figure out what the target URL is and what locale to use. For self-hosters, it's just LOCALE=fr yarn build (env vars, technique).

I'm still not super happy with the one-PWA-per-locale design, but it's definitely the simplest to implement.

nolanlawson avatar Nov 26 '20 19:11 nolanlawson

I haven't quite understood why a separate origin is an appealing option or even the only easy/fitting option. I don't doubt that it makes sense, I just haven't yet understood what design choices lead to this.

Here's what I (possibly, incorrectly) assumed so far:

  • Pinafore is deployed as a static site. As such, it would be hard to support localised initial rendering on the logged-out landing page. Unless we require either SSR, or late redirects to an alternate entry file (e.g index.fr.html or fr.pinafore), or late client-side rendering. None are a great fit.
  • For the day-to-day use, I think we always render client-side, right? Or are there cases where we use a visually-non-empty static HTML file as starting point? I assumed not since we also support things like themes, and menu customisation. This led me to think that either all logged-in rendering happens by SW, with maybe some cases (if the SW is cold) rendering happening late on the main thread? Either way would seem to allow for e.g. varying by language through Accept-Language (SW) or navigator.language(s). Possibly with something in the general settings to override its value.

In terms of maintenance cost, I'm mostly thinking of the fact that whenever you add a new strings, you potentially have to add it in multiple languages. Or else the translations end up being 85% that language and 15% English fallbacks, which is not ideal.

Do you mean that you'd prefer not to merge PRs that add strings unless they are also set in other languages supported at that time? I don't deny that'd be ideal. But I think we'd be able to reach a wider audience if we allow for translators to catch up on their own time every once in a while.

At Wikipedia we've been deploying releases multiple times a week for many years, with development happening almost exclusively in English (despite many developers not being native speakers, myself included). We then have a nightly import to, and export from, our translation platform. By the time a release goes out a day or two post-commit, there's a good amount of coverage, and it increases as weeks go by. (Major features are opt-in first and generally finish translations prior to launch.)

And yeah, we do use fallback languages (ultimately to English), but partial translations I think tend to be a good thing and better than nothing. It might also play a small role in exposing what goes on behind the scenes and inviting new translators. This is actually how I started translating for Mastodon, I noticed an incomplete translation!

I guess what I'm trying to say is - I hope you won't feel you have to let translations slow down the development. You can always fill in French translations later, or see if someone else wants to pick it up.

Krinkle avatar Nov 26 '20 20:11 Krinkle

@Krinkle

Pinafore is deployed as a static site.

Yes it is. The vercel.json file is basically how we tell Vercel to host the static files. Unfortunately they don't seem to support varying the static files by locale headers.

For the day-to-day use, I think we always render client-side, right?

90% is rendered-client side. The navigation links and other non-logged out content is rendered server-side, though.

Do you mean that you'd prefer not to merge PRs that add strings unless they are also set in other languages supported at that time?

Eh, I guess I'll merge the PRs. It's a problem, but the Mastodon frontend has the same problem: https://crowdin.com/project/mastodon

I hope you won't feel you have to let translations slow down the development.

I've reconsidered it, and I don't think it'll be a big deal. If the PRs start to become too much to manage, I'll switch to Weblate or Crowdin.

I haven't quite understood why a separate origin is an appealing option or even the only easy/fitting option.

It's really not a great option (compared to pinafore.social/fr/), but it's the easiest given how Sapper and Vercel work. (Also keep in mind we're probably stuck forever on Svelte v2 since the upgrade would take way too much time, which means I've forked Sapper and we basically own it entirely.)

I'm hoping I can figure out a better way in the end, so for now I'm just focusing on getting the English strings out, and then we'll go from there.

nolanlawson avatar Nov 26 '20 20:11 nolanlawson

Some more details of what I'm working on:

  • I chose ICU format because it seems to be the standard
  • I chose to use format-message, because out of the few i18n libraries I looked at on npm, it seems to be the most flexible and the most tree-shakeable. In particular, it allows me to pre-parse the ICU strings ahead of time and inject them directly in the bundle, so that at runtime we only need the formatter code.
  • The formatter code for format-message is also pretty small. And even that small amount can be cut in half by blocking its Intl.PluralRules polyfill since all the browsers we support already support this API. All in all, we're adding about 5kB minified non-gzipped to the total bundle size, plus some minor bloat for JSON format for some formatted messages, which should hopefully be trivial once the PR is complete.
  • I'm using a custom Webpack loader to accomplish this, because there's really no good way to do this properly in Svelte v2. As-is, what I've written is also pretty easy to swap out for a client-side formatting solution if we decide to go that way.

As for the fr.pinafore.social vs pinafore.social/fr issue, I'm not tackling that for now.

Step 1 is to just get the English strings into a separate file, which is here: https://github.com/nolanlawson/pinafore/blob/intl/src/intl/en-US.js

nolanlawson avatar Nov 27 '20 01:11 nolanlawson

@nolanlawson Awesome, looking good. I like the precompiled approach and use of JS template literal files. Much easier to read and edit than JSON indeed! This may become tricky in the future if/when we use something like Weblate, which has more specific export formats. But.. not a concern for now. Also, we could likely keep it for the source language (en-US) with a small adapter and still have it export translations as JSON. Would be interchangable from the import/require perspective.

The only suggestion I have for now is to include doc comments in the en-US.js file to describe roughly what each message represents where it is used. (This'll be easier to do as we go, then retroactively, when we might not have fresh intent in recent memory). In particular, such documentation would also help to serve as reminder that messages should not be glued or re-purposes in other contexts as this may be incompatible with translations later on. I've often found that the act of writing the one or two sentence of message doc, served as "too clever detector" to myself.

As an example, the oneHour is something I've seen misused in the past. Documenting it as "a duration, such as for a poll" would ensure we don't re-use it outside the poll feature without double-checking, and that also to make sure it is not use for something that is textually identical (in English) but semantically different in meaning from "a duration".

See mediawiki/Localisation for a large resource of best practices, background, and restrictions that we've accumulated from translators over the years.

Krinkle avatar Nov 27 '20 04:11 Krinkle

@Krinkle Adding comments is a good idea, and exporting as JSON as well. Thanks!

After thinking about it, I think I may end up going with a client-rendered solution after all. It's not as good for perf or SEO, and it may result in a "flash of unstyled English", but it's way easier to implement. I can't think of a great way to do either the fr.pinafore.social solution or the pinafore.social/fr solution without rewriting large parts of our Sapper fork.

Someone also reached out to me about translating Pinafore to Farsi, so I've started thinking about RTL, but I think it'll be a big process because most of the UI is hardcoded for LTR.

Either way, I'm making progress on en-US.js.

nolanlawson avatar Nov 27 '20 20:11 nolanlawson

Experimental i18n support is in this PR ^ . Some instructions for contributing translations are in CONTRIBUTING.md.

Right now it's just build-time-only and single-language. I've only added a French translation so far so LOCALE=fr yarn build will build a French version.

My next task is to try to implement a client-side implementation (I think it makes the most sense for Pinafore's architecture as a PWA) and test the performance to see if it's acceptable. The current compile-time solution has basically no impact on performance, which is good.

I'm also trying to figure out how to get right-to-left support in. Right now I'm just doing the bare minimum which is detecting LTR vs RTL at build time and injecting it in the dir attribute on <html>:

https://github.com/nolanlawson/pinafore/blob/0022286b46770260948d4a2f2c69cee3a9aef344/src/build/template.html#L2

Later on, we should use the global process.env.LOCALE_DIRECTION to change some things around to favor RTL rather than LTR.

nolanlawson avatar Nov 29 '20 22:11 nolanlawson

As a modern project, it'd make an interesting case study for using only native CSS for a layout that renders appropiately in either interface language direction. Normally I'd use CSSJanus to produce a flipped version of the stylesheet in which margins, borders, floats, etc. change accordingly.

But we now have Flexbox which provides direction-neutral keywords like start/end/inline/block, and margin and padding support these as well. Even if we need to tune a small handful of rules, we could potentially hardcode those two ways using :dir() which inherits as-needed.

One tricky area that comes to mind is distinguishing interface language from content language (and their direction). I believe, while imperfect, this is already native to Mastodon, right? In that each post has a language associated with it.

Krinkle avatar Nov 30 '20 00:11 Krinkle

@Krinkle If I had been more clever, then yep, I would have used the modern CSS properties rather than margin-left, margin-right, etc. :) As is, something like CSSJanus might be a requirement; I'm not sure.

nolanlawson avatar Dec 13 '20 23:12 nolanlawson

@nolanlawson are you actively accepting more translations? or are you keeping it more restricted because each additional language affects performance?

trosel avatar Feb 17 '22 17:02 trosel

@trosel New languages do not affect performance. Translations are welcome. :)

nolanlawson avatar Feb 18 '22 16:02 nolanlawson

@trosel Ah wait sorry, I was confusing this with another project. The current status of translations is that you can contribute them, but for it to work, you would have to build and host your own version of Pinafore in that language.

nolanlawson avatar Feb 18 '22 16:02 nolanlawson

@nolanlawson what is the other project you were thinking of?

would you recommend putting in the translation work if you were me? or is this app’s days numbered because of it being stuck on an old version of Sapper?

also, do i need a server backend to fork and translate the front end? or is it all client side, so i could just host it on github pages?

trosel avatar Feb 24 '22 21:02 trosel

@trosel emoji-picker-element

I'll be honest; I'm not super active on Mastodon anymore and I've lost a lot of my motivation to work on this project. I used to be motivated because my wife used Pinafore/Mastodon, but she gave up about a year ago.

You don't need a backend or anything. Once you build it, you have basically static files. It needs some lite routing rules, which is what the Node server is for, but frankly you could just skip it and host the files on any old static server and 99.99% of the site would work. The only thing that would break is browsers that don't support service worker, which is not very many of them.

nolanlawson avatar Feb 26 '22 03:02 nolanlawson

Hello:

I prefer to use the same rules applied to the original project with headers so that 100% works. I've hosted the project in Vercel, in a free account, now that the spanish translation is merged. A fellow has registered pinafore.es domain and we have reditect the Vercel project there. My suggestion, to make easier to host different languages, is that in vercel.json file, builds is removed since this property is not recommended anymore, and buildCommand is used. Inthis way, people wishing to host different translations will discover that they just need to change the language code. I'll send a PR just with this change, though vercel.json is using deprecated properties, maintained for compatibility, as explained in project config with vercel.json

You may merge it if you want. The spanish translation is working at

https://pinafore.es

nvdaes avatar Dec 20 '22 06:12 nvdaes