privacytools.io
privacytools.io copied to clipboard
NEW [WIP] Add localization support with jekyll-simple-i18n
Supersedes #1458. This brings changes from #1503 into the main repo for development.
@djoate:
Preview of index.html translation: https://deploy-preview-1509--privacytools-io.netlify.com/es/ English site: https://deploy-preview-1509--privacytools-io.netlify.com
Sample of partially translated pages:
- https://deploy-preview-1509--privacytools-io.netlify.com/es/
- https://deploy-preview-1509--privacytools-io.netlify.com/es/providers/ (which you can get to by pressing the green Aprenda Más button under Proveedores on the homepage)
- https://deploy-preview-1509--privacytools-io.netlify.com/es/providers/cloud-storage/ (which you can get to by pressing the Servidor Cloud link in /es/providers/)
Every other page has not been translated (pages without translate: true
in front matter are not generated)
This is not meant to be a full-fledged PR but rather a proof of concept for a better solution for localizing the site. If this is an acceptable solution, please make this into a branch in this repository so that we can start localizing using this plugin.
With the jekyll-simple-i18n plugin (MIT licensed), this makes translating the site easier to manage. You should visit the plugin's GitHub repository and read up on it, but here are the features of this plugin from their README:
- No external dependencies. Plugins utilize existing Jekyll features.
- Source strings and page titles can be placed directly in templates for seamless editing and readability.
- A ready-to-translate YAML file that includes all of the canonical source strings is generated every time the site is built.
- It's easy to add new languages. Just create a single file that contains the translated strings. Everything else happens automatically.
- Custom front matter is added to translated pages that can be used within your Liquid templates.
- Optional Transifex integration.
- Built-in support for hreflang tags.
It's based on Transifex, but it can be used with a different service such as Weblate. I've made some modifications to the plugin (e.g. renaming transifex to weblate and handling null source text).
This PR includes an example of index.html and a part of card.html (The "Learn More" button text) being translated. The plugin did not seem to work with the github-pages gem, so github-pages was switched with jekyll gem (which is what the current i18n branch does anyway). You can go to https://deploy-preview-1509--privacytools-io.netlify.com/es/ (or build locally) to see the following:
https://deploy-preview-1509--privacytools-io.netlify.com gives the original English site.
Here is a snippet of the source code for index.html:
<h1 id="sponsors" class="anchor"><a href="#sponsors"><i class="fas fa-link anchor-icon"></i></a> {% t Sponsors%}</h1>
<div class="alert alert-success" role="alert">
<strong>{% t New!%}</strong> {% t Showcase your brand as a sponsor of PrivacyTools here and support our mission of creating a world free of mass surveillance!%} <a href="/{% if page.language %}{{ page.language }}/{% endif %}sponsors/" class="alert-link">{% t Learn more...%}</a>
</div>
A snippet of resources.html:
<p><a href="/{% if page.language %}{{ page.language }}/{% endif %}classic/"><i class="fas fa-info-circle"></i> {% t Prefer the classic site? View a single-page layout.%}</a></p>
<div class="row">
{% capture providers_title %}{% t Providers %}{% endcapture %}
{% capture providers_page %}/{% if page.language %}{{ page.language }}/{% endif %}providers/{% endcapture %}
{% capture providers_description %}{% t Discover privacy-centric online services, including email providers, VPN operators, DNS administrators, and more!%}{% endcapture %}
{% include card.html color="success"
title=providers_title
icon="fas fa-server"
iconcolor="dark"
page=providers_page
description=providers_description
%}
Instead of using keys and two different files, you just wrap the original text around with {% t ... %}
tags, and the plugin will automatically key that string (with its own ID) into weblate-source-file.yml
. If you are trying to translate things inside of a card, you have to do the same thing as before with capturing text.
The source YAML is generated on build into the root folder of the repo. This source file can then be copied into _data/languages/
and then renamed into one of the languages in the language map to set up a translation. This seems much easier to maintain compared to cross referencing between two different files.
The plugin will also not create multiple keys for duplicates of the exact same string. For example, {% t Worth Mentioning %}
will have one key associated with it, and there will only be one key to translate. All other pages that use {% t Worth Mentioning %}
will share the same key (however, I've modified it so that, for instance, {% t Worth mentioning %}
and {% t Worth Mentioning! %}
would have distinct keys)
We would have to replace local links with something like this in order to get the right pages (and I believe external links can be wrapped in translate
tags without a problem):
<a href="/{% if page.language %}{{ page.language }}/{% endif %}sponsors/" class="alert-link">{% t Learn more...%}</a>
A porition of the source file, weblate-source-file.yml
, looks like this:
---
Prefer_the_classic_site?_View_a_singlepage_layout.: |
Prefer the classic site? View a single-page layout.
Providers: |
Providers
Discover_privacycentric_online_services_including_email_providers_VPN_operators_DNS_administrator: |
Discover privacy-centric online services, including email providers, VPN operators, DNS administrators, and more!
Learn_More: |
Learn More
It's a different format when compared to what is currently in the i18n branch now, i.e. it has the format
string_key_id: |
This is a source string from the site
rather than
"string_key_id": "This is a source string from the site"
If this format doesn't work with Weblate, we can change the plugin so that it generates the latter format.
A sample translation into Spanish (using deepl.com) can be found in _data/languages/es.yml
:
---
Prefer_the_classic_site?_View_a_singlepage_layout.: |
¿Prefieres el sitio clásico? Ver un diseño de una sola página.
Providers: |
Proveedores
Discover_privacycentric_online_services_including_email_providers_VPN_operators_DNS_administrator: |
Descubra servicios en lĂnea centrados en la privacidad, incluyendo proveedores de correo electrĂłnico, operadores de VPN, administradores de DNS y mucho más!
Learn_More: |
Aprenda Más
Known issues
- I don't know of a way to make the plugin translate the strings of permalinks. This means that pages such as
https://privacytools.io/es/donate
will have to stay as/es/donate
for now. Update: See comment below since this is actually not an issue. - Because the plugin uses the actual string as the translation ID/key, there may be collisions (e.g. "Learn more..." and "Learn more!" will have the same ID). To help remedy this, I've modified the plugin so that the IDs preserve capitalization and can also contain periods, exclamation marks, and question marks. I've also modified the plugin so that the max length for an ID is 100 characters.
- I've had to modify the Gemfile to not use github-pages gem, so github-pages was switched with jekyll gem (which is what the current i18n branch does anyway). The
jekyll-sitemap
plugin also had to be explicitly added in order for the site to compile. - Breadcrumbs: I'm not a Ruby programmer, so I'm not going to try to make a solution for the breadcrumbs.
To reiterate, this is a proof of concept for a better i18n solution. Feel free to add this as a branch if this seems like an acceptable solution.
Deploy preview for privacytools-io ready!
Built with commit e5cfd449806a692721d342f51a4eddf7589ffae7
https://deploy-preview-1509--privacytools-io.netlify.com
The PR description should link to https://deploy-preview-1509--privacytools-io.netlify.com instead of https://deploy-preview-1503--privacytools-io.netlify.com
List of some issues:
- 404.html won't generate for a specific language, even if you key 404.html and set
translate: true
- The plugin refuses to translate "Yes" and "No" (workaround in https://github.com/privacytoolsIO/privacytools.io/pull/1510/commits/8b5226d388182d48deee4f0a8aba3c221d05aad1)
Keying for parts 1 to 4 have been done to key the entire site. Almost all of the content has been keyed as of part 4. Part 5 is to merge master into the i18n development branch to fix conflicts, and to clean up any loose ends such as breadcrumbs and the language select.
The final check after part 5 is merged should be done on this PR where we have a preview for the entire i18n-simple
branch and all of the parts.
- Part 1 (https://github.com/privacytoolsIO/privacytools.io/pull/1510): Card/cardv2 content, navbar text, misc, and everything under Providers in the navbar.
- Part 2 (https://github.com/privacytoolsIO/privacytools.io/pull/1517): Everything under Browsers in the navbar
- Part 3 (https://github.com/privacytoolsIO/privacytools.io/pull/1518): Everything under Software in the navbar
- Part 4 (https://github.com/privacytoolsIO/privacytools.io/pull/1519): Everything under OS in the navbar
- Part 5 (here):
Merging
master
intoi18n-simple
to fix conflicts, and fixing loose ends
The plugin will also not create multiple keys for duplicates of the exact same string. For example, {% t Worth Mentioning %} will have one key associated with it, and there will only be one key to translate. All other pages that use {% t Worth Mentioning %} will share the same key (however, I've modified it so that, for instance, {% t Worth mentioning %} and {% t Worth Mentioning! %} would have distinct keys)
This may not be the case, and is breaking Weblate. I'm seeing this 4 times in weblate-source-file.yml
, for example:
httpsplay.google.comstoreappsdetails?idorg.torproject.torbrowser_KEY: |
https://play.google.com/store/apps/details?id=org.torproject.torbrowser
Could not parse translation base file: while constructing a mapping in "<unicode string>", line 2, column 1: About_PrivacyTools_KEY: | ^ (line: 2) found duplicate key "httpsplay.google.comstoreappsdetails?idorg.torproject.torbrowser_KEY" with value "https://play.google.com/store/apps/details?id=org.torproject.torbrowser " (original value: "https://play.google.com/store/apps/details?id=org.torproject.torbrowser ") in "<unicode string>", line 2900, column 1: httpsplay.google.comstoreappsdet ... ^ (line: 2900) To suppress this check see: http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys Duplicate keys will become an error in future releases, and are errors by default when using the new API.
:(
https://gist.github.com/JonahAragon/79cc67b9bbdb30cb73ccbfa2641b17e2
Edit: lol, it's the question mark. You're importing the strings into the regex filter without escaping the characters, hmm...
This branch is working on Weblate at https://weblate.nablahost.com/projects/privacytoolsio/website/, and you can register at https://weblate.nablahost.com/accounts/register/ to test it out.
~~Also, the site doesn't build if an empty translation YAML file exists, they need at least 1 translated key. That's probably a bug that could be fixed.~~ Edit: This doesn't appear to be the case anymore. I think it only failed when every language file was blank, because with empty and non-empty files it appears to build as expected.
You're importing the strings into the regex filter without escaping the characters, hmm...
A proposed solution for this is in PR https://github.com/privacytoolsIO/privacytools.io/pull/1524 to encode punctuation (and string length)
To keep track, some things that haven't been keyed or implemented (yet):
- Changing the language setting in the nav bar to refer to the in-repo translations (though this can wait)
- Breadcrumbs
- Terminal commands (such as the ones described under https://deploy-preview-1509--privacytools-io.netlify.com/operating-systems/#os) and Firefox config options. I don't think Linux terminal commands and FF options have translations anyway, so I left them alone.
- ~~The privacy statement (https://github.com/privacytoolsIO/privacytools.io/pull/1525#pullrequestreview-322699154)~~ done in https://github.com/privacytoolsIO/privacytools.io/pull/1509/commits/e53f3f5f9cdb85cf4bfb0215e093a5debdb50439
This branch is working on Weblate at https://weblate.nablahost.com/projects/privacytoolsio/website/, and you can register at https://weblate.nablahost.com/accounts/register/ to test it out.
Weblate seems to be down
@djoate I know, I'm having issues with it not sending emails :(
As a side note, would it be possible to serve Weblate with privacytools.io as the root domain?
Not at this time AFAIK.
I'm not sure what's going on with Weblate, it might be a known issue (https://github.com/WeblateOrg/weblate/issues/3231) or I might just need to look at it tomorrow with fresh eyes. In either case, it allows for anonymous suggestions and it works otherwise, so I'll get registrations figured out at a later time.
Seems like Weblate thinks that there's a newline at the end of every string
I've noticed that for regenerating the source file, it seems the best thing to do is to make sure Jekyll isn't serving the site when the file is regenerated.
e.g.,
- if you try to delete a string while the site is still being served and you save the file, that string won't be removed from the source file until you stop serving the site and reserve/rebuild the site.
- if you add a new
{% t string %}
and save it while the site is being served, the source file will only add that new string to the end of the file rather than nearby other strings (affects the Weblate nearby strings view) until you stop serving the site and reserve/rebuild the site.
I've noticed that for regenerating the source file, it seems the best thing to do is to make sure Jekyll isn't serving the site when the file is regenerated.
I have noticed that as well. Maybe it would be better if we did add the file to .gitignore and have a bot of some kind build the source file and push it to the repo when changes are made. But for now at least I think we can handle it easily manually.
Seems like Weblate thinks that there's a newline at the end of every string
I saw that but didn't get a chance to look further. The issue, I believe, is this space between each key:
About_PrivacyTools_18_KEY: |
About PrivacyTools
About_the_PrivacyTools_organization_and_contributors_to_the_PrivacyTools_website_communities_and_servicesP_109_KEY: |
...
In YAML, the |
means "all text until the next key" I believe, so technically the source file also has newlines at the end of each string, it just isn't as noticeable in this format, but becomes noticeable when Weblate converts it into a single-line format.
What I don't know is whether or not the \n
actually affects anything. I didn't see it change anything on the site itself (and can't imagine when it would make a difference, since newlines are generally ignored in HTML). But if it does affect things removing that space between keys in the source would probably fix it.
What I don't know is whether or not the \n actually affects anything. I didn't see it change anything on the site itself (and can't imagine when it would make a difference, since newlines are generally ignored in HTML)
The plugin itself also does .strip
on all rendered tag text. Regardless, I think leaving the newline for every string may be confusing for translators and that it would be better to try to get rid of it
At a second glance with Weblate, I'd agree:
data:image/s3,"s3://crabby-images/6b709/6b70986fc948cdb0eed6b99485d273b50674e552" alt="image"
I've "solved" the email problem temporarily by enabling Sign in with GitHub and GitLab. So feel free to register an account, just don't sign up with email :)
@JonahAragon I looked it up (https://yaml.org/YAML_for_ruby.html#three_trailing_newlines_in_literals). Apparently |
would give a final new line while |-
strips all newlines
@JonahAragon What's the status of this PR? There was some people looking to localize the site recently (https://www.reddit.com/r/privacytoolsIO/comments/enui17/arabic_version_of_privacytoolsio/).
For things to do before this can be considered ready,
- Localize breadcrumbs
- There's still pull request https://github.com/privacytoolsIO/privacytools.io/pull/1535 open that needs to be merged
- I think a policy for pull requests adding content should be made (for example, are contributors are expected to tag the new strings they make in pull requests?)
- Maybe look into getting the self-hosted Weblate as a subdomain of privacytools.io
Hi.
Would it be possible to reopen discussions around translations? Could we put it as a main priority for the community to do every changes needed to make translations a thing with Weblate?
This issue struggles since a long time but is, IMHO, one of the most important one. Actually, most translations of the website are aged and people should not totally rely on it.
@Booteille My thoughts on the state of translations progress is given at https://github.com/privacytools/privacytools.io/issues/1106#issuecomment-743725471