galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

[WIP] Tool Translation support

Open manabuishii opened this issue 7 years ago • 29 comments

manabuishii avatar Jun 27 '17 14:06 manabuishii

Huge thanks to @jmchilton . His initial version and his support is very very helpful.

manabuishii avatar Jun 27 '17 14:06 manabuishii

French

translate_to_french

Japanese

translate_to_japanese

manabuishii avatar Jun 27 '17 15:06 manabuishii

@bgruening It was your recommendation to convert this translate function to translate things after breaking the help text on new lines. This makes sense only if we sort of encourage tool developers to format their help text to include one sentence per line right? If yes, should we make that an IUC recommendation? We also want to restrict this to not convert parts of the rst that aren't simple text - e.g. code formatted examples for instance right?

Does anyone have any good links about using gettext to perform translation of rst documents?

jmchilton avatar Jul 05 '17 11:07 jmchilton

I've pushed a bunch of changes to this including:

  • I'm not sure the details have been ironed out completely (e.g. how do local admins provide their own translations) so I have hidden this behind a new config variable "enable_beta_tool_translations".
  • Running Galaxy in the tool testing mode will enable this config option by default though so it is easier to see these in action.
  • I've changed the default domain from "default" to "messages" - which seems to be more common in pybabel examples.
  • I've added an example tool with translations as well as the pybabel commands used to generate the mo and po files from messages.pot.
  • I've refactored the changes made during the hackathon so that they work for more than just data parameters.

To quickly test this run Galaxy with:

GALAXY_RUN_WITH_TEST_TOOLS=1 LANGUAGE=fr sh run.sh --reload

and you will see the translate tool in the tool panel has its only parameter label translated.

There is still a lot a to do I think and we could spend weeks on this project - but for me this is now a minimum viable product that I think should be merged - so I'm 👍 on this PR.

@manabuishii if you review these changes, like them, and then rename this PR to remove the "[WIP]" - I will happy to merge it barring objections from others.

jmchilton avatar Jul 05 '17 13:07 jmchilton

@galaxybot test this.

jmchilton avatar Jul 05 '17 13:07 jmchilton

I like the idea of translating chunks like parameter help text and names :)

I would however avoid trying to translate multi-line text on a naive line-by-line basis - right now there is no convention of not splitting sentences over multiple lines, which would be needed for that to work (in fact the tradition of line wrapping at ~80 characters works against this approach).

Would it be simpler to have the entire help RST content provided in a per language translation file?

peterjc avatar Jul 05 '17 17:07 peterjc

Would it be simpler to have the entire help RST content provided in a per language translation file?

I would think so in the abstract but I'm not aware of a way to support that with the current file formats and gettext based libraries. I'm not sure if doing it in a fashion that seems intuitive is worth the price of developing something custom. There seems to be a lot of tooling developed around the way we are currently doing it.

I would however avoid trying to translate multi-line text on a naive line-by-line basis - right now there is no convention of not splitting sentences over multiple lines, which would be needed for that to work (in fact the tradition of line wrapping at ~80 characters works against this approach).

It would be a very heavy change to start encouraging longer lines in help text blocks. There are a lot of tradeoffs here and I'm not sure which ones to make. This is part of why I marked it is beta - I'd really like to see another similar project that have larges blocks of text and see what they do.

jmchilton avatar Jul 05 '17 18:07 jmchilton

I would like to provide some background about the work we have done and why we have done it this way.

The initial goal was to use standard mechanisms and tools to translate Galaxy artifacts (UI, galaxy-tools, training material and so on). In the GNU universe there is a lot of tooling around https://en.wikipedia.org/wiki/Gettext and po files. Which can be used in some nice web-sites and so on to abstract strings from the application. The aim here is to have translaters not to deal with technical details.

So we started to create such po files, it turned out to be complicated (too complicated for a hackathon) to use gettext to convert a XML+rst to po, so @manabuishii wrote a python script to create the po files. At first we took the entire rst as one large text. But we encountered that this large text will end up as key in a python dict at some point, which makes newlines really hard to maintain. At least in our examples it broke things. So we took the one-line-one-string approach which is/was more stable. I don't think it is that bad but we probably should do better. There is XML tooling to convert XML to po and I hope that similar things exists for rst. The question for me is how many logic we want to put into Galaxy and which strings end up at the translator side.

If we agree on slitting it line-by-line we should make this a IUC recommendation to keep things clear. We could also skip the entire po idea, but we wanted to stick to standards as much as possible. I had no idea that multi-line strings are not supported, but maybe I'm missing something important. However, I do plan to look at this more next week.

Feel free to merge this, it is a great step and was a nice hackathon project!

ping @yvanlebras as he is also interested in this for Galaxy-E

bgruening avatar Jul 06 '17 20:07 bgruening

Is it practical to split the RST text up sentence by sentence, and then translate sentence by sentence?

My feeling is this would work much better for the translation of existing files than line by line BUT we risk breaking the RST syntax in the process.

Perhaps parsing the RST with docutils would be the correct solution here? (I've got a tiny bit of experience here from hacking-together https://github.com/peterjc/flake8-rst-docstrings - but not enough to judge the feasibility of this idea)

peterjc avatar Jul 06 '17 21:07 peterjc

(I don't want to be negative here - I was suggesting skipping RST content for now, and only translate the simple strings like input parameters etc - which would still be a big step forward)

peterjc avatar Jul 06 '17 21:07 peterjc

Thanks for this work @jmchilton and discussion @bgruening @peterjc. We have to take time with @ThimotheeV to work on it!

yvanlebras avatar Jul 06 '17 21:07 yvanlebras

hello , I am new to using the website but I learn Quickly . we can start with the poster as an easy job that will give us a push forward

Abdelazeem-Elhabyan avatar Nov 08 '17 21:11 Abdelazeem-Elhabyan

Picture of community Logo for community and project First meeting date and place Founders …………….. What is galaxy project ? use the suitable description for use in your community Why to Use Galaxy ? Accessible: Users without programming experience can easily specify parameters and run tools and workflows. Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis. Transparent: Users share and publish analyses via the web and create Pages, interactive, web-based documents that describe a complete analysis.

…………….. More resources Project Website , contacts for the Community Do you think this would be adequate for the logo or we add or change thins ?? @yvanlebras

Abdelazeem-Elhabyan avatar Nov 09 '17 19:11 Abdelazeem-Elhabyan

مشروع الجالاكسي للمعلوماتية الحيوية.docx @yvanlebras Check this if we can reproduce it in other languages ??

Abdelazeem-Elhabyan avatar Nov 10 '17 10:11 Abdelazeem-Elhabyan

@Abdelazeem-Elhabyan What value is your LANGUAGE or LANG environment variable? I guess your language is Arabic ?

I'm Japanese and using "ja_JP.UTF-8"

manabuishii avatar Nov 27 '17 06:11 manabuishii

Hi everyone, just a link towards translation related exchange . You are ok to go forward @manabuishii @Abdelazeem-Elhabyan ?

yvanlebras avatar Dec 07 '17 16:12 yvanlebras

Ok with this first push forward @Abdelazeem-Elhabyan ! The content of your docx file is what you describe here ? I will try creating one for us. I think that the best will be maybe to give a english version of your .docx file.. and maybe better than docx, we can use Github to centralize our translation related tests...

yvanlebras avatar Dec 07 '17 16:12 yvanlebras

@manabuishii @Abdelazeem-Elhabyan just an inof concerning the fact that we have enhanced the translation possibility of the Galaxy client... https://github.com/galaxyproject/galaxy/blob/dev/client/galaxy/scripts/nls/locale.js so don't hesitate to update translation on your prefered languages ;)

yvanlebras avatar Dec 07 '17 16:12 yvanlebras

What about using converters like this http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/txt2po.html and related infrastructure http://pootle.translatehouse.org/discover.html ?

yvanlebras avatar Dec 07 '17 19:12 yvanlebras

WIP

  • [ ] Change pot file name from message.pot to TOOL_DIRECTORY_NAME.pot
  • [ ] Support other Tool type
  • [ ] Add test code

manabuishii avatar Dec 08 '17 03:12 manabuishii

@yvanlebras At tool translation. First we tried to use xml2po but not works well. So we just create manabuishii/galaxy-translation . But we only check tool translation. We need to check other tool like txt2po or xxx2po for other pages.

manabuishii avatar Dec 08 '17 08:12 manabuishii

So after taking a little bit more time looking at the tool translation issue, I think we can go forward splitting the xml line-by-line and finding ways to translate, I think at least:

  • tool name

  • param name

  • param label

  • param help

  • option value (maybe only the part readable by the UI user to avoid issues with when tags and so on)

Considering the help section, I think we can go later....

yvanlebras avatar Dec 08 '17 15:12 yvanlebras

Did you use the same approach to translate documentation?

jessMaia avatar Jan 19 '18 20:01 jessMaia

Which documentation ?

yvanlebras avatar Jan 19 '18 21:01 yvanlebras

manabuishii answered my question. Thanks!

What I meant by documentation were things like the Tool descriptions in : https://usegalaxy.org/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fmiller-lab%2Fgenome_diversity%2Fgd_pca%2F1.0.0&version=1.0.0&__identifer=81j7z0mqgdg

For example: "What it does

The user selects a gd_ped dataset generated by the Prepare Input tool. The PCA tool runs a Principal Components Analysis on the input genotype data and constructs a plot of the top two principal components. It also reports the following estimates of the statistical significance of the analysis."

jessMaia avatar Jan 19 '18 22:01 jessMaia

Ok, you mean the help section of a tool...

yvanlebras avatar Jan 20 '18 04:01 yvanlebras

Did you already work on this kind of task? Are you interested to work on it?

yvanlebras avatar Jan 20 '18 08:01 yvanlebras

Maybe the updates of this discussion can be of interest: https://github.com/galaxyproject/training-material/issues/404

yvanlebras avatar May 25 '18 13:05 yvanlebras