msgtools icon indicating copy to clipboard operation
msgtools copied to clipboard

Automated translations

Open leeper opened this issue 11 years ago • 5 comments

Add automated translations using translate or translateR

leeper avatar Aug 16 '14 15:08 leeper

I've been thinking about the workflow for this. It goes something along these lines:

  1. Update the PO files to ensure that all the latest messages are included. I think this is just a call tomake_translation(), but I may have misunderstood.
  2. If the PO file for the target language does not exist, or the user decides to translate all messages, retrieve them using get_messages().
  3. If the PO file already exists, the user should have the option to only translate message that haven't been translated already. In a PO object, that's all the direct messages where msgstr == "". For countable messages, it's where there is at least one message blank, i.e., vapply(msgstr, function(x) !all(nzchar(x)), logical(1)). Might also need to include messages with fuzzy in its flags_comments field.
  4. The countable message need dummy number values substituting into the message. For example, in most languages, you'd have to substitute 1 and 2 (or any number more than 1).
  5. Send them off to the translation engine.
  6. Match the returned translations back up to their IDs.
  7. Write the new translations into the PO file along with updated metadata.

richierocks avatar Jan 04 '17 01:01 richierocks

Steps 1 and 2 can be achieved by just doing make_translation(). That will return a "po" object containing the untranslated message (or any translations that already exist in the .po file).

I think that means we basically need two new functions:

  1. [ ] Extraction function to extracted untranslated strings from the "po" object (to achieve 3-4)
  2. [ ] Assignment function to set the translation somehow based upon the original string (to achieve 6)

Then write_translation() handles 7.

It occurs to me that we may want either of the following:

  1. Give the msgid's some kind of identifier so that they are easier to refer to programmatically, or
  2. Make the "po" objects something like S4 or R6 objects, so that we can embed assignment functions within the object itself do to something like:
po$translate(msg1, "translated string")

What do you think?

leeper avatar Jan 04 '17 19:01 leeper

An identifier should be reasonably straightforward to add. You paste the msgid and the msgctxt, then call digest::digest() on each row to create a hash.

Making the po objects R6 object is also possible, but it will likely take me a couple of days, so I won't be able to push to CRAN until this weekend.

richierocks avatar Jan 04 '17 20:01 richierocks

Hashing is a good idea - much easier. Let's do that.

leeper avatar Jan 04 '17 20:01 leeper

I've just done a rewrite with R6 and hash values being auto-generated when you read the direct or countable elements.

The tests are broken but it should be a quick fix tomorrow. The API is the same as before, so it shouldn't break msgtools (though let me know if you have any problems).

On 4 January 2017 at 15:46, Thomas J. Leeper [email protected] wrote:

Hashing is a good idea - much easier. Let's do that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RL10N/msgtools/issues/8#issuecomment-270481881, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMD1TNQWe-U4zQ29Z4M0fcDw5vRxWLRks5rPAU5gaJpZM4CYBSL .

-- Regards, Richie

Learning R http://shop.oreilly.com/product/0636920028352.do 4dpiecharts.com

richierocks avatar Jan 05 '17 04:01 richierocks