XUnity.AutoTranslator icon indicating copy to clipboard operation
XUnity.AutoTranslator copied to clipboard

Translation cache and regex rules parse

Open TokcDK opened this issue 5 years ago • 6 comments

Hi. Noticed strange things while time of making translation for another game. Valid and tested in regex test services regex not working for strings which is adding in time of parse other splitted regex rule. For example we have next lines in cache txt:

sr:^<size%3D16><color%3D#688CC8>(.+)</color></size>\n\n<size%3D14><color%3D#D2B68F>(.+)</color></size>$=<size%3D16><color%3D#688CC8>$1</color></size>\n\n<size%3D14><color%3D#D2B68F>$2</color></size>
sr:^(.+)([-+][\d]{1,3})$=$1$2
攻撃力=Attack
  1. After it in game XUA will parse text: <size%3D16><color%3D#688CC8>攻撃力+205</color></size>\n\n<size%3D14><color%3D#D2B68F>blabla</color></size>
  2. where it will extract for translation two strings and first will be 攻撃力+205
  3. then it will parse 攻撃力+205 but in this time it will skip rule sr:^(.+)([-+][\d]{1,3})$=$1$2 in cache and just will send string to translation service and will add translation Attack power +205 to generated cache txt.

Second moment about If 攻撃力=Attack will be added to substitutions then XUA will add {{A}}+205={{A}}+205 to generated cache txt when it have only symbols, digits and string from substitutions and can be parsed without need to be added in cache. here ofcourse not a standart + but it is still a symbol.

Or maybe I made a mistake somewhere?

TokcDK avatar Jun 01 '20 08:06 TokcDK

This can be handled fairly simple by the plugin.

First of all, recognize that in order for a text to be parsed multiple times through different regexes, you must enable text parser recursion within the plugin.

[Behaviour]
MaxTextParserRecursion=1         ;Indicates how many levels of recursion are allowed when text is parsed so it can be translated in different parts. This can be used with splitter-regexes in advanced scenarios. The default value of one essentially means that recursion is disabled.

This should be bumped to a larger value than 1.

Second of all, recognize that unless you want to change the actual rich text markup to something else, it is pointless to add it in a regex, because the plugin already handles it by default. Instead simply concentrate on the individual pieces of text contained between the markup "code".

Which means that if you bump that max recursion config, you should be able to get away with just:

sr:^(.+)([-+][\d]{1,3})$=$1$2

Although that is a very broadly hitting regex, so I would recommend boiling it down to only the known words that appear before such a "stat".

See one more examples using recursion here: https://github.com/bbepis/XUnity.AutoTranslator#splitter-regex

In the context of parsing texts recursively, the rich text parser is handled in the same way as a splitter regex, except it takes a lower priority (it will check if there are matches to regexes before trying to apply rich text behaviour).

gravydevsupreme avatar Jun 01 '20 13:06 gravydevsupreme

Thanks. Didnt know about MaxTextParserRecursion is for this purpose and thought it working by default.

Unfortunally sr:^(.+)([-+][\d]{1,3})$=$1$2 not enough and it not working without first regex and if first regex required variation is not added then in generated will be added something like this:

<size%3D16><color%3D#688CC8>クリティカル率+8%(服ビリ状態)</color></size>\n<size%3D16><color%3D#688CC8>経験値の獲得量+18%</color></size>\n<size%3D16><color%3D#688CC8>聖光の槍のスキル効果を+12%</color></size>\n<size%3D16><color%3D#688CC8>聖光ショックのスキル効果を+18%</color></size>\n\n<size%3D16><color%3D#1DD720>One Hit Man</color></size>\n<color%3D#858080><size%3D16>【2】セット効果</size>\n<size%3D15>全ての回復効果を禁止する</size>\n<size%3D15>防御力-50%</size>\n</color><color%3D#858080><size%3D16>【4】セット効果</size>\n<size%3D15>1%確率一撃必殺(ボス戦の時、ダメージはその総血量の10%)</size>\n</color><color%3D#858080><size%3D16>【6】セット効果</size>\n<size%3D15>4%確率一撃必殺(ボス戦の時、ダメージはその総血量の10%)</size>\n</color><color%3D#858080><size%3D16>【8】セット効果</size>\n<size%3D15>致死的な傷害を受けた場合、10%の確率でフルライフで復活する。</size>\n<size%3D15>次の攻撃は必ず一撃で仕留める</size>\n</color>\n<size%3D14><color%3D#D2B68F>その一発必殺のお兄さんの衣装をコピーしたもの。</color></size>=<size%3D16><color%3D#688CC8>Crit chance+8%(When undressed)</color></size>\n<size%3D16><color%3D#688CC8>Experience+18%</color></size>\n<size%3D16><color%3D#688CC8>Holy Light Spear skill effect+12%</color></size>\n<size%3D16><color%3D#688CC8>Holy shock skill effect+18%</color></size>\n\n<size%3D16><color%3D#1DD720>One Hit Man</color></size>\n<color%3D#858080><size%3D16>Amount of coins earned from monsters + 14%</size>\n<size%3D15>Prohibit all healing effects</size>\n<size%3D15>Defense-50%</size>\n</color><color%3D#858080><size%3D16> </size>\n<size%3D15>1% chance one shot killer (damage is 10% of the total blood volume in boss battle)</size>\n</color><color%3D#858080><size%3D16>CP+20%</size>\n<size%3D15>4% chance one shot killer (damage is 10% of the total blood volume in boss battle)</size>\n</color><color%3D#858080><size%3D16>[8] Set effect</size>\n<size%3D15>If you receive a fatal injury, there is a 10% chance that you will be revived at full life.</size>\n<size%3D15>Be sure to finish the next Attack with a single blow</size>\n</color>\n<size%3D14><color%3D#D2B68F>A copy of the one-shot deadly brother's costume.</color></size>

..but with first regex it working when recurse 2.

TokcDK avatar Jun 01 '20 15:06 TokcDK

When you say it does not work without the first regex, is it because the translation is wrong or because the full line is added to the cache.

The first would be a bug, but the second is expected behaviour at the moment.

gravydevsupreme avatar Jun 01 '20 15:06 gravydevsupreme

Without 1st regex I expected it will not add all line

<size%3D16><color%3D#688CC8>クリティカル率+8%(服ビリ状態)</color></size>\n<size%3D16><color%3D#688CC8>経験値の獲得量+18%</color></size>\n<size%3D16><color%3D#688CC8>聖光の槍のスキル効果を+12%</color></size>\n<size%3D16><color%3D#688CC8>聖光ショックのスキル効果を+18%</color></size>\n\n<size%3D16><color%3D#1DD720>One Hit Man</color></size>\n<color%3D#858080><size%3D16>【2】セット効果</size>\n<size%3D15>全ての回復効果を禁止する</size>\n<size%3D15>防御力-50%</size>\n</color><color%3D#858080><size%3D16>【4】セット効果</size>\n<size%3D15>1%確率一撃必殺(ボス戦の時、ダメージはその総血量の10%)</size>\n</color><color%3D#858080><size%3D16>【6】セット効果</size>\n<size%3D15>4%確率一撃必殺(ボス戦の時、ダメージはその総血量の10%)</size>\n</color><color%3D#858080><size%3D16>【8】セット効果</size>\n<size%3D15>致死的な傷害を受けた場合、10%の確率でフルライフで復活する。</size>\n<size%3D15>次の攻撃は必ず一撃で仕留める</size>\n</color>\n<size%3D14><color%3D#D2B68F>その一発必殺のお兄さんの衣装をコピーしたもの。</color></size>=<size%3D16><color%3D#688CC8>Crit chance+8%(When undressed)</color></size>\n<size%3D16><color%3D#688CC8>Experience+18%</color></size>\n<size%3D16><color%3D#688CC8>Holy Light Spear skill effect+12%</color></size>\n<size%3D16><color%3D#688CC8>Holy shock skill effect+18%</color></size>\n\n<size%3D16><color%3D#1DD720>One Hit Man</color></size>\n<color%3D#858080><size%3D16>Amount of coins earned from monsters + 14%</size>\n<size%3D15>Prohibit all healing effects</size>\n<size%3D15>Defense-50%</size>\n</color><color%3D#858080><size%3D16> </size>\n<size%3D15>1% chance one shot killer (damage is 10% of the total blood volume in boss battle)</size>\n</color><color%3D#858080><size%3D16>CP+20%</size>\n<size%3D15>4% chance one shot killer (damage is 10% of the total blood volume in boss battle)</size>\n</color><color%3D#858080><size%3D16>[8] Set effect</size>\n<size%3D15>If you receive a fatal injury, there is a 10% chance that you will be revived at full life.</size>\n<size%3D15>Be sure to finish the next Attack with a single blow</size>\n</color>\n<size%3D14><color%3D#D2B68F>A copy of the one-shot deadly brother's costume.</color></size>

of the item description to the cache and just will replace with translated variant if regex2 sr:^(.+)([-+][\d]{1,3})$=$1$2 is exists and all translations for (.+) already exist in cache and will add in cache only translation for text which still not exists in cache because will be thousands of variants of generated items and effects for them. But when 1st regex is added it will find it and then will parse 2nd regex and will add only text of main effects and translate for it like 攻撃力=Attack without need to add thousands of variants of full item description like above.

Mostly I understood when I added sr:^(.+)([-+][\d]{1,3})$=$1$2 regex and changed recursion to 2 it parsed pieces of text between formatting tags with the regex. As I understood I can prevent adding full big item description with all formatting tags like above only by adding required variant of 1st regex and in cache will be added only effects name.

TokcDK avatar Jun 01 '20 18:06 TokcDK

The "second" behaviour is an artifact of how rich text traditionally has been handled.

If there is just rich text that is being parsed, then the user generally expects the entire string to be output to the cache file, and not just the individual elements of text between the markup. But what if there is both rich text AND a regex is involved?

Currently the rule is that the outer-most text parser gets to decide how the result gets cached. And in this case, if there is no "first regex" the rich text parser gets to decide and it says it wants to output the entire line to the cache.

What would be the most correct thing to do in such a situation? I am really not sure. Perhaps it should be configurable. :)

gravydevsupreme avatar Jun 01 '20 20:06 gravydevsupreme

It would be good if there was config option for it to select if need to add ful line when all it elements already in cache and add only new parts of the full line without any formatting tags and dont need to add many mb. of text with thousands combinations of one line like in case above when item description in Diablo items style with several types of items with many combinations of possible effects.

TokcDK avatar Jun 02 '20 04:06 TokcDK