OpenRefine
OpenRefine copied to clipboard
Back-end localization
Is your feature request related to a problem or area of OpenRefine? Please describe.
Some strings that appear in the UI are generated in the backend: exceptions, descriptions of operations, documentation of GREL functions, and probably others that I am not aware of. We currently do not have any way to translate these.
Describe the solution you'd like We should investigate an architecture to make these strings translatable. I can think of two approaches:
- Replace these strings by identifiers (possibly with template parameters in the case of operation description) and delay their formatting to the front-end, such that we can use the existing frontend localization system (Wikimedia jQuery i18n plugin). This would mean that the backend can continue to ignore in which language the application is being used;
- Bring in server-side localization using an appropriate Java library. We should make sure the messages which appear in the backend are picked up by Weblate somehow.
Describe alternatives you've considered We might use one solution for a particular class of messages (say operation descriptions) and another for other messages (say exceptions).
This is a proposed Outreachy project in 2022. If you are not planning to apply for an internship via Outreachy/GSoC, we kindly ask that you do not work on this task yet, in order to leave the floor to potential interns.
In the enterprise, I've seen lots of usage with Thymeleaf templates which are dead simple to use.
I have proposed this as a potential Outreachy project (and maybe GSoC if we participate to that as well). This means I think it should be a good topic for a 3 months internship. This is therefore not suitable as a first issue: have a look at our list of good first issues instead.
Some possible libraries:
- Plain Java with
PropertyResourceBundle
(Since Java 9+ it can handle reading UTF-8 encoding in the.properties
files) https://www.baeldung.com/java-resourcebundle - Rincl might be able to help here (and uses ICU4J). It's Apache 2 licensed and I've reached out to the author to see what's up. https://rincl.io/ UPDATE: The author, Garret, replied to me and will comment in this issue in a day or two.
- Localizer library like Jenkins uses? https://www.jenkins.io/doc/developer/internationalization/i18n-source-code/ and https://github.com/kohsuke/localizer
UPDATE: The author, Garret, replied to me and will comment in this issue in a day or two.
Hi, all. @thadguidry got in touch with me as he mentioned. I'm excited that you found my library, Rincl. Let me try to put it in context so you can make an informed decision.
First off, if you go the low-level PropertyResourceBundle
route, you'll still need to deal with lots of infrastructure stuff, like loading the info, exposing the data through some API, overriding values when needed, and using the values in messages with templates. That is all the stuff that Rincl does. It provides a common i18n interface (sort of like SLF4J for logging), and using Csar you can even set it up so that this common interface pulls in different resources in different parts of your application based upon thread groups (e.g. if different users are logged in and chose different locales).
Rincl uses a "pull" approach, similar to what you would ultimately do with PropertyResourceBundle
. The current version of Rincl doesn't generate classes. It doesn't "inject" i18n resources. Your application has to ask for them via a resource key. But Rincl does help make sure the right resources get loaded, and even allows class hierarchy overrides. For example you can have an application.properties
file stored in the package com.example
and then override just a few properties in com.example.foo.Bar.properties
. When the com.example.foo.Bar
class requests resources, Rincl knows to load them from com.example.foo.Bar.properties
but fall back to application.properties
for those common properties not overridden for Bar
.
A lot of thought went into Rincl, based on my experience with things like Apache Wicket and JSR 296 for Swing applications (now defunct). It is well-designed and has unit tests. The Java API documentation is second to none. You can read more in the Rincl intro. As for my expertise, you can read my lessons on i18 and Unicode in the course I wrote.
Now for the downsides of Rincl: no one (to my knowledge) uses it (although I use it in various of my own applications and it represents two decades of code that has been refactored out into lightweight, modular libraries). And though I've spent countless hours on it over the years, I currently have a full-time client and it would be hard to devote time to this project at the moment.
On the other hand, after talking to @thadguidry, I realized that it would be nice if someone were to actually use one of my projects, so if you do choose Rincl and run into a show stopper, I'll do my best to unblock you. (Rincl may well have 100% of what you need already.) I can't promise anything, and I don't see it going into "active development" anytime soon. But if (finally) several projects start seeing how great it is 😁 then who knows.
For the meantime let me know if you have any specific technical questions.
In the enterprise, I've seen lots of usage with Thymeleaf templates which are dead simple to use.
@thadguidry With this option does it mean that the whole application will have to be changed to a spring boot application ? Am not sure if it is possible to include the spring boot maven dependency in the pom file and use the relevant dependencies just for the localization functionality without making changes to other functionalities of the application
I am Elroy Kanye from Cameroon, an intern for OpenRefine at Outreachy Internships in May 2022. For this course of Outreachy, I will be working on this issue.
https://groups.google.com/g/openrefine/c/b0p827Ee-0M/m/h0wu5qBBBAAJ
One of my tasks with Antonin was to identify those parts of OpenRefine which need a translation. This needed me to work with OpenRefine from a user's perspective. However, I acknowledge that I may not be able to get everything. That said, I wish to plead with users of OpenRefine to help provide those areas of this tool which they would love to see in their language.
Below are some examples I was able to identify:
- [x] Notifcations, History Entries
- [x] History entries for Undo/Redo: Every entry in project action history in the undo/redo tab.
- [x] Heads-up notification when an action such as a transform or single-cell edit is made.
- [x] Notifications and histories on column movement or column/cell edit operations
- [x] Functions
- [x] All GREL syntax errors.
- [x] Descriptions of all GREL functions and expressions
- [x] Evaluation errors of all GREL functions and expressions
- [x] Names of facets on reconciliation.
- [x] On project creation with a CSV file lacking headers – default column names are not translated.
In the following weeks, we will be introducing changes in parts of OpenRefine to address the above stated. Please let me know if there are other sections you have noticed which need localising.
Good work @elroykanye! This is a long and thourough project that touched a lot of code. Well done.
Regards, Antoine
I've added #5996 to provide developer documentation for how this all works.