gettext icon indicating copy to clipboard operation
gettext copied to clipboard

Support `.mo` files for compilation

Open maennchen opened this issue 3 years ago • 2 comments

Expo supports parsing / writing .mo files, which are a lot faster to read since it is a simple binary format.

I would like to support it here as well.

Proposed changes:

  • Users can choose if they wants to stay with only the .po files or if they want to use the .mo.
  • On merge a .mo file is also written
  • On compile the .mo file is loaded instead of the .po
    • if the .po file is newer than the .mo file, I would log a warning for the user to update the .mo since they likely edited the .po by hand and forgot to call merge again

maennchen avatar Jul 20 '22 21:07 maennchen

@maennchen how do other Gettext implementations (for other languages) tackle MO files?

whatyouhide avatar Jul 30 '22 12:07 whatyouhide

@whatyouhide Most (C based) implementations I‘ve used so far only read the .mo for runtime translations. The .po / .pot is only used for extraction / to help with merge problems.

maennchen avatar Jul 30 '22 14:07 maennchen

@maennchen can we close this now that Expo supports MO files?

whatyouhide avatar Dec 17 '22 09:12 whatyouhide

@whatyouhide That was the next issue i wanted to tackle:

Support .mo in Gettext itself. I think (we have to benchmark it) it makes compilation faster.

maennchen avatar Dec 17 '22 10:12 maennchen

@maennchen got it, makes sense. This would require a slightly different workflow for Gettext entirely, right? We'd have to dump POs and MOs, and read MOs if present, falling back to PO? Do you have an exact workflow in mind? I ask because I have some cycles I can dedicate to Gettext 😉

whatyouhide avatar Dec 17 '22 13:12 whatyouhide

@whatyouhide I wanted to make the file handling strategy configurable (at least at the start to prevent breaking changes)

  • Strategy 1: As it is now
  • Strategy 2:
    • Extraxt creates .pot
    • Merge updates .po and directoy writes a .mo as well
    • on compile we read the .mo (and maybe yield a warning if the filetime of .po is larger than .mo)
    • For updating the .mo after .po changes the following ways are possible:
      • The used gettext editor to edit the .po already outputed the .mo
      • Re-run merge
      • maybe add a new task that only does that (similar to https://github.com/elixir-gettext/expo/blob/main/lib/mix/tasks/expo.msgmft.ex)
      • use native msgformat from gettext

maennchen avatar Dec 17 '22 13:12 maennchen

Do you envision MOs being committed in version control? Is this the flow used by GNU Gettext, if you know?

whatyouhide avatar Dec 17 '22 16:12 whatyouhide

@whatyouhide I intend to commit them.

Gettext itself has no opinion about mo files in vCS as far as I‘m aware of.

I know from the PHP ecosystem that in most cases mo files are committed. I also have experienced opinions that those should not be committed and is only added on demand / for releases.

Speaking for myself: I would commit them and would not be concerned about conflicts in .mo files since you can always regenerate them from merged .po files.

Because there seem to be different opinions about this, I wanted to implement it as a configurable strategy so that people can decide how they want to handle it.

maennchen avatar Dec 17 '22 17:12 maennchen

(I closed this by accident, sorry about that!)

My guess would be that these files should not be committed, as essentially they're a duplicated "cache" of PO files anyways. I’m ok with configuration, but I'd like to keep simplicity as much as possible. For example, before diving into this, I'd ask: does Gettext compilation take significant time today? Are we sure introducing MO files, which increases complexity, is worth it?

whatyouhide avatar Dec 17 '22 18:12 whatyouhide

@whatyouhide

Performance Impact

In a bigger application like https://github.com/jshmrtn/hygeia, the parsing of the .po file takes around 0.2s per language on my machine. If the performance comparison of https://github.com/elixir-gettext/expo/issues/21 is still more or less accurate, potentially around 75% of the time could be saved. (~ 0.8s)

I think the generating of the functions inside the backend takes longer though compared to the actual parsing. So maybe having a look at that performance would make a bigger difference.

An even bigger impact is the compile time dependency of all the modules using the gettext backend. Changing one translation currently means recompiling most of the applications it is used in.

Committing

I think committing .mo files is ok. Most people are also committing .pot files even though they're technically just cached extractions. Depending on the project, the line of how much we want to "cache" can be different. In bigger applications, I might want to make the trade-off and in a quick demo project not.

I also don't seem to be alone with this opinion. There are currently over 132 million checked-in .mo files on GitHub: https://github.com/search?l=&q=extension%3Amo&type=code

maennchen avatar Dec 17 '22 19:12 maennchen

An even bigger impact is the compile time dependency of all the modules using the gettext backend. Changing one translation currently means recompiling most of the applications it is used in.

Yeah, especially because changing a translation does not change the code generated at compile-time.

josevalim avatar Dec 17 '22 20:12 josevalim

Considering the added complexity of supporting MO files, I'd definitely shift our focus on the compile-time dependencies and function generation, yeah.

whatyouhide avatar Dec 18 '22 07:12 whatyouhide

Discussion moved to #330

maennchen avatar Dec 20 '22 11:12 maennchen

Great, thanks @maennchen. I will close this for now then, and we can reopen in case this comes up again. Thanks! 💟

whatyouhide avatar Dec 20 '22 16:12 whatyouhide