hyperrogue icon indicating copy to clipboard operation
hyperrogue copied to clipboard

[DRAFT] An approach to linting the translations

Open Quuxplusone opened this issue 4 years ago • 2 comments

As noted in #154, it's very easy to "break" the translations files. If we wanted, we could try to automate the detection of broken translations, by comparing the strings in language-??.cpp against the strings in the actual program source code. I wrote a little proof-of-concept, using a Python script to extract the string literals from the program. Run make to see the very spammy output:

Unused translation in PL,CZ,RU,DE: "\n\nOnce you collect 10 Bomberbird Eggs, stepping on a cell with no adjacent mines also reveals the adjacent cells. Collecting even more Eggs will increase the radius. Additionally, collecting 25 Bomberbird Eggs will reveal adjacent cells even in your future games."
Closest match in program: "\n\nOnce you collect a Bomberbird Egg, stepping on a cell with no adjacent mines also reveals the adjacent cells. Collecting even more Eggs will increase the radius."
Unused translation in PL,CZ,RU,TR,DE: " (E:%1)"
Closest match in program: " (e)"
Unused translation in PL,CZ,RU,TR,DE,PT: " (expired)"
Closest match in program: " (Emerald)"
Unused translation in PL,CZ,RU: " (increases treasure spawn)"
Closest match in program: " (killing increases treasure spawn)"
Unused translation in DE: " Hell: %1/9"
Closest match in program: " Hell: %1/%2"
Unused translation in PL,CZ,RU,TR,DE,PT: " [%1 turns]"
Closest match in program: "%1 turns"
Unused translation in PL,CZ,RU,TR,DE: " kills: %1"
Closest match in program: "kills: %1"
Unused translation in PL,CZ,RU,TR,DE: "\"By now, you should have your own formula, you know?\""
Closest match in program: "\"I would like to congratulate you again!\""
Unused translation in PL,CZ,RU,TR,DE: "\"I like collecting ambers at the beach.\""
Closest match in program: "finer lines at the boundary"
Unused translation in PL,CZ: "#%1, cells: %2"
Closest match in program: "bad cells: %1"
Unused translation in PL,CZ,RU,TR,DE: "%1 takes %his1 revenge on %the2!"
Closest match in program: "%The1 takes %his1 revenge on %the2!"
Unused translation in PL,CZ,RU,TR,DE: "%The1 bites %the2!"
Closest match in program: "%The1 eats %the2!"
Unused translation in PL,CZ,RU,TR,DE,PT: "%The1 breaks the mirror!"
Closest match in program: "%The1 breathes fire!"
Unused translation in PL,CZ,RU,TR,DE,PT: "%The1 disperses the cloud!"
Closest match in program: "%The1 fills the hole!"
[...and so on, and so on ... 674 lines...]

I don't expect this to be merged as-is. But it might serve as a jumping-off point for someone either to polish this automated detector, or to go through its spammy output and make a pull request fixing the low-hanging fruit. E.g. changing "%1 takes %his1 revenge on %the2!" into "%The1 takes %his1 revenge on %the2!"

Quuxplusone avatar Jun 28 '21 20:06 Quuxplusone

Have you seen devmods/gentrans.cpp? It does a similar job (looking for texts that should be translated).

zenorogue avatar Jun 28 '21 20:06 zenorogue

Have you seen devmods/gentrans.cpp? It does a similar job (looking for texts that should be translated).

I had not seen it, no. IIUC, gentrans.cpp performs the opposite operation from what I did in this PR: it looks for strings in the program that lack any translations, as opposed to translations that are unreachable from anywhere in the program.

$ make mymake
$ ./mymake devmods/gentrans
$ ./hyper -gentrans
[...]
001076 // checking all the files
001099 S("HyperRogue %1: online demo", literal in hyperweb.cpp:142)
001099 S("play the game", literal in hyperweb.cpp:145)
001099 S("learn about hyperbolic geometry", literal in hyperweb.cpp:146)
001099 S("toggle high detail", literal in hyperweb.cpp:148)
001099 S("Temple of Cthulhu", literal in hyperweb.cpp:152)
001099 S("Land of Storms", literal in hyperweb.cpp:153)
001099 S("Burial Grounds", literal in hyperweb.cpp:154)
001109 S(
001109     "released under GNU General Public License version 2 and thus "
001109     "comes with absolutely no warranty; see COPYING for details\n\n"
001109     , literal in help.cpp:225)
001113 // unrecognized nonliteral: tour::slides[tour::currentslide].name in help.cpp:1046
001121 S("?", mdmodes[vid.monmode])
001123 S("One wrong move and it is game over!", literal in menus.cpp:757)
[...]

There's some low-hanging fruit in there, too; e.g. the translation files are inconsistent between "One wrong move and it is game over!" and "One wrong move, and it is game over!".

Quuxplusone avatar Jun 28 '21 21:06 Quuxplusone