organisation icon indicating copy to clipboard operation
organisation copied to clipboard

Localization of Apertium Tools

Open mr-martian opened this issue 4 years ago • 11 comments

I was talking to @jonorthwash about localization and it came up that all our command line tools are currently hardcoded as English-only and it would be good if this were otherwise.

I'm uncertain how to effectively/efficiently implement this, but if anyone has any ideas, I'd be happy to do the setting up (since I'm probably digging through all the string handling code this summer anyway).

mr-martian avatar Mar 19 '21 01:03 mr-martian

I think gettext is still pretty much the standard.

flammie avatar Mar 19 '21 02:03 flammie

We use ICU anyway, so we have access to https://unicode-org.github.io/icu/userguide/locale/ , https://medium.com/i18n-and-l10n-resources-for-developers/the-missing-guide-to-the-icu-message-format-d7f8efc50bab

TinoDidriksen avatar Mar 19 '21 06:03 TinoDidriksen

Also, we'd need to decide on what not to localize. E.g., floating point weights and other data that is used in pipes must have a single fixed format, regardless of locale.

TinoDidriksen avatar Mar 19 '21 07:03 TinoDidriksen

So things like lt-proc --help and error messages?

If we start translating error messages, we should give them unique error codes, e.g. AP3141526: Feil: Ugyldig ordbok (hint: venstresida av eit oppslag er tomt) could be the one about empty left hand sides, such that searching online for AP3141526 will lead to a wiki page (see how shellcheck does it https://github.com/koalaman/shellcheck/wiki/SC2154 )

unhammer avatar Mar 19 '21 08:03 unhammer

Added to https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code

GSoC Coding Challenge: Make a trivial C++ Hello World app where the Hello World string is translated using ICU's API and external file format.

Tools that should be localized https://github.com/topics/apertium-core :

  • https://github.com/apertium/lttoolbox
  • https://github.com/apertium/apertium
  • https://github.com/apertium/apertium-lex-tools
  • https://github.com/apertium/apertium-separable
  • https://github.com/apertium/apertium-recursive
  • https://github.com/apertium/apertium-anaphora
  • https://github.com/apertium/lexd

All communication should ideally go via our IRC channel (Freenode #apertium) or apertium-stuff mailing list. We don't use issues for chatting.

TinoDidriksen avatar Mar 20 '21 22:03 TinoDidriksen

The error codes are good for googlability even if there's no localization for sure. For localized number formats I'd say, if it's part of the stream format that needs to be fixed for tools, then the English numbers hard-coded are a good thing, but for the rest it's ok to use the format you get with the locale as well.

As a translator I'd prefer to have a text-editor editable format like gettext's po, which icu seems to support too, if possible. I really dislike the fact that most floss projects these days are moving towards only being localizable with some horrible browser-based apps...

flammie avatar Mar 23 '21 15:03 flammie

If I understand this correctly the problem that we need to solve here is about localizing the texts used in different applications to different languages based up on the locale setup on the machine of the user? I have not worked with localization on CPP apps, but I have worked with CPP and on localization on apps which were of other languages. If I am getting it right, then I will be happy to try my hand at localizing a CPP hello world app and present it to you guys. Does it need to be a Desktop App or a console app would work fine?

gat786 avatar Apr 04 '21 09:04 gat786

Only the console apps matter for this. And localizing a Hello World app using ICU should teach you all you need to know about how it reacts to locale.

TinoDidriksen avatar Apr 04 '21 09:04 TinoDidriksen

Also, all communication should ideally go via our IRC channel (Freenode #apertium) or apertium-stuff mailing list. We don't use issues for chatting.

TinoDidriksen avatar Apr 04 '21 09:04 TinoDidriksen

Okay I will join the channels

gat786 avatar Apr 04 '21 09:04 gat786

I was just thinking that whoever ends up working with these strings that there's a lot of room for improvement for apertium tools printouts... for example when man compiles a thing the only printout is like:

main@standard 67820 212710

this is probably good for debugging for someone who has worked with the tools but for usability it should be rewritten in human languages and translateable.

flammie avatar Apr 06 '21 09:04 flammie