XUnity.AutoTranslator
XUnity.AutoTranslator copied to clipboard
suggestion to improve Manual Translations quality with post user Translations file.
Currently, _Substitutions.txt is the first to be applied, so if I insert an alternate word in _Substitutions.txt, AutoTranslator cannot use the existing translation. So when _Preprocessors.txt is applied before _Substitution.txt, the user's translation is naturally printed and unnatural words can be inserted into _Substitution.txt to improve the quality of machine translation.
and If user translation(Not the whole sentence, but part of sentence) can be applied to postprocessing results(a translated language->a translated language) after machine translation, and regex can be used with postprocessing, it will be possible to translate the unnatural expressions that are repeated naturally.
ex) _Preprocessors.txt -> _Substitution.txt -> Already created translation file -> _AutoGeneratedTranslations.txt -> _Postprossors.txt -> Screen output
And now we can't use regex for _Substitution.txt, so it's less useful for manual translation, and if we can use regex, it would be good to substitute certain Japanese words without unnatural translation.
If this happens, we can improve the quality of machine translation through natural expression without translating every sentence.
Adding a feature that does pre-processing is certainly not unfeasible. But depending on how various games are implemented, something like this could sink performance if overused (due to engine misuse by game devs).
Know that the current implementation of Substitution/Preprocessing are very specialized features:
- Substitutions are generally applied as the first thing and a lookup is made against the substituted string as well as the non-substituted string in the translation "pipeline" before trying to translate it against a translation endpoint.
- Preprocessors on the other hand are only ever used right before a text is sent off to a translation endpoint and never at any other point.
Adding a preprocessor as described here would:
- Modify the untranslated part of the string when output into _AutoGeneratedTranslations.txt. Modifying the preprocessors afterwards may invalidate entries in this file as they may not appear like that again.
- Cause the plugin to send half-translated, half-untranslated strings for translation. (this doesn't happen with substitutions because the substituted part of the string is replaced with a variable name).
Could you give some examples of the things you hope to achieve? Like example untranslated/translated text pairs + the preprocessor, translation and/or post-processor that might translate them.
I'm not familiar with English, so I'm sorry that the answer was a little late trying to find a suitable example. First of all, all I think of is what happens after the one-second wait time is over and the text no longer changes. What I think is that when doing machine translation, use substitution as appropriate as possible and refine the expression through post-processing. And the pre-translation mentioned earlier is intended to use the translation made by the existing users, and the substitution.txt is intended to be used after printing the existing one. Substitution is applied in situations where the untranslated text passes through the translator. <Pretranslated.txt> = <User.txt> = It is already saved in _AutoGeneratedTranslations.txt
Let me take an example now. <user.txt> おはようございます。可愛らしい寝顔をされるんですね=Good morning. You're sleeping like a cute baby ゲームを終了しますか?=Quit the game? イッちゃいそう…あっ=I'm gonna cum...ah
Assuming that the following untranslated text appears in the user.txt saved as above, <_Untranslated text> 1: うん…可愛らしい寝顔をされるんですね。 2: はい、可愛らしい寝顔をされるんですね…。 3: ぁあうう……可愛らしい寝顔をされるんですね。!
After confirming the existence of user translation, If we can use regex in _Substitutions.txt, Existing user translation is also used and can be applied to various situations.
<_Existing checking>
<_Substitutions.txt> r:\b可愛らしい寝顔をされるんですね\b=You're sleeping like a cute baby
After that, it passed through the translator, and some untranslated original texts were all translated, Put the translated sentence we want to change in <Post.txt> and refine it before printing. And Post.txt should also be applied to parts of a sentence like _substitution.txt. <_Post.txt> r:\blike a cute baby\b=like a puppy
The process is summarized as follows.
- Search for an existing sentence and output the existing one => _Substitutions.txt => Translator => Post.txt => the original text and translation saved in _AutoGeneratedTranslations.txt
- ぁあうう…可愛らしい寝顔をされるんですね。! => ぁあうう…You're sleeping like a cute baby。!=> Ahh ...You're sleeping like a cute baby.!!=> Ahh ...You're sleeping like a puppy.!! <_AutoGeneratedTranslations.txt> ぁあうう…可愛らしい寝顔をされるんですね。!=Ahh ...You're sleeping like a puppy.!!
ex2) In the case of Katakana, machine translation is often unnatural, but This can also be solved by using Substitutions and post-processing with regex.
<_Untranslated text> メニューを終了しますか?
<_Existing checking>
<_Substitutions.txt> r:\b([ァ-ン]+)を終了しますか\b=Quit the $1
<Post.txt> If the translation with certain words is unnatural, refine it in post-processing. r:[qQ]uit the (?=menu|window)=Close the
- Untranslated text => Existing check => _Substitutions.txt => Translator => Post.txt => _AutoGeneratedTranslations.txt
- メニューを終了しますか? => Quit the メニュー?=> Quit the menu? => Close the menu? <_AutoGeneratedTranslations.txt> メニューを終了しますか?=Close the menu?
ex3) There are many cases where translation is unnatural due to moaning in h situation, This is also somewhat solvable. <_Untranslated text> うん…イッちゃいそうだ…ぁあ…あ…
<_Existing checking>
<_Substitutions.txt> r:\bイッちゃいそうだ?\b=I'm gonna cum
<Post.txt> r:\b[Yy]eah\b(.{0,20}\bcum\b)=Uh$1
- Untranslated text => Existing check => _Substitutions.txt => Translator => Post.txt => _AutoGeneratedTranslations.txt
- うん…イッちゃいそうだ…ぁあ…あ… => うん…I'm gonna cum…ぁあ…あ… => Yeah ... I'm gonna cum ... ah ... ah ... => Uh ... I'm gonna cum ... ah ... ah ... <_AutoGeneratedTranslations.txt> うん…イッちゃいそうだ…ぁあ…あ…=Uh ... I'm gonna cum ... ah ... ah ...
Therefore, if the _Substitutions.txt with regex is applied after using the existing translation, and post.txt with regex can be used after machine translation, it will be helpful for the quality of translation and repeated translation.
If it is difficult to handle the above, it would be better to use only regex in the _Substitutions.txt.
About substitutions:
They cannot use regex and never will, but you probably can use them how you describe them already if you simply want to replace parts of a sentence with something "pre-translated" from the substitution file. If you are worried about stuff like {0} being output to _AutoGeneratedTranslations.txt you can set GenerateStaticSubstitutionTranslations=True in the config file, then it will output the entire translation instead.
I assume this is essentially what you want with the "existence/substitution" check.
About post-processing:
That is certainly possible. The plugin is already doing some sort of post-processing for purposes of being able to clean up translations to not include characters that often not included in specific character sets used in games.
Replacing entire sentence-parts through something like this would, however, require you know the idiosyncrasies of the translator you're using to really make sense. (Bing may translate a specific type of word differently from Google, etc.)
About regexes
You can still do some of what you want with regexes. You can use splitter-regexes (sr prefix) to split up a text and translate the parts individually. This is used in games such as AI Girl to translate the text that appears when you pickup a certain number of different types of items, so you don't need a million different combination of translations.