[i18n] Tooling to detect inconsistent translation and formatting Japanese text
@svrnm @chalin Thank you for supporting each language localization! At least, the localization for Japanese could make a smooth start thanks to the great automation tools.
Now I started feeling like I need some automation tool to detect inconsistent translation and Japanese formatting to have more contributions from many people. How can I add that feature to the repo? Tricky parts are:
- Only
lang:jalabeled PRs need the - The tool needs to have a set of mappings between correct and incorrect wordings (in some file)
What do you think about this idea?
Originally posted by @ymotongpoo in https://github.com/open-telemetry/opentelemetry.io/discussions/4302#discussioncomment-9870096
hey @ymotongpoo, I copied your comment from the discussion to a dedicated issue because this is a scoped task.
We have tooling for that for English content already, but turned it off for translations, since it requires additional language-specific configuration. So here is what we need to do:
- for each tool create a configuration for Japanese that is only applied to files in
content/ja - enable them locally and test those tools if they work properly
- enable them in the GitHub actions workflows
The tools are:
- [ ] cspell for spell checking
- [ ] prettier for formatting
- [ ] textlint (especially "terminology") for consistent translations
We should address them 1-by-1
@svrnm thank you for routing the topic to the issue.
- cspell doesn't work well for Japanese and for this purpose.
- it is said prettier 3.0.0+ works well with Japanese (ref: https://github.com/prettier/prettier/pull/11597)
- textlint is the most fitting and common tool for this kind of Japanese document. (ref: an example for Japanese technical magazine https://gist.github.com/inao/f55e8232e150aee918b9)
* cspell doesn't work well for Japanese and for this purpose.
I was not aware of that, I did a quick check and indeed it only supports English so far, that's interesting. Do you have any recommendations for a spell checker that works well for Japanese?
* it is said prettier 3.0.0+ works well with Japanese (ref: [Markdown[next branch]: Do not insert spaces between Chinese/Japanese & latin letters prettier/prettier#11597](https://github.com/prettier/prettier/pull/11597))
If it works well, then let's have a PR to introduce it for Japanese. If you provide the language-specific config (if needed/where needed), @chalin or I can handle the rest
* textlint is the most fitting and common tool for this kind of Japanese document. (ref: an example for Japanese technical magazine [gist.github.com/inao/f55e8232e150aee918b9](https://gist.github.com/inao/f55e8232e150aee918b9))
That's great to hear! Again, if you can provide us with some guidance (via a PR) with the required config etc, we can take a look into integrating it into the workflows
@svrnm cc @open-telemetry/docs-ja-approvers
Hello.
To make my Japanese translation more efficient, I created a otel-localiztion-ja.yml(a prh file) and procedures in a bellow directory.
https://github.com/Msksgm/textlint-rule-preset-unofficial-otel-localization-ja
If I run npm run check:text with the otel-localiztion-ja.yml(a prh file), we can see below results we should fix.
I hope that it will be included in open-telemetry/opentelemetry.io because it will be revised other approvers and use by other Japanease commiters.
Can I create a PR to add this repository? I assume that this command will be run in a github actions' lint.
That's great! ... if @open-telemetry/docs-ja-approvers agree with the rules in https://github.com/Msksgm/textlint-rule-preset-unofficial-otel-localization-ja/blob/main/otel-localiztion-ja.yml.
@chalin cc @open-telemetry/docs-ja-approvers
That's a really good point. I understand that my rule is not enough. When I created a PR, I want a lot of advices from @ymotongpoo and @katzchang. This is because they are professional Japanese translators in OTel and o11y. They also translate o11y books into Japanese like bellow books.
- https://www.oreilly.co.jp/books/9784814400126/
- https://www.oreilly.co.jp/books/9784814401031/
- https://www.oreilly.co.jp/books/9784814401024/
@Msksgm thank you, Masaki, for adding this! this feature has been longed for for a while and we're really happy with this PR.
@chalin the current rules are very basic ones and those we've pointed out repeatedly. my concern from another aspect is if it's possible to make this Japanese textlint check optional? As we grow the rule file, we should see some cases where the use of some words are correct in that case but the textlint warns. Still we're able to let the contributor confirm the suggestion by textlint, even if we set that checker optional. WDYT?
Yes it is possible to disable text lint in some cases, with special comments. (From memory, I think that we mapped it to the prettier-ignore directives.) I'll be OOO, and will let @svrnm or @theletterf follow through if you need further guidance before I'm back.
@chalin @svrnm @theletterf @open-telemetry/docs-ja-approvers I created a bellow PR experimentally. Please confirm it seems good or not.
https://github.com/open-telemetry/opentelemetry.io/pull/7160
@katzchang @ymotongpoo
I want to add more words in prh/ja.yml.
Please review it.
apologies for the delay here, this is really some great work, I'll take a look into the PR right now!
@svrnm
Thank you for your review. I will confirm how it is effective for our Japanese translation after one month.
thanks @Msksgm, I'll keep this issue open and we revisit it in a month from now
Hi @svrnm,
It has been about a month since I merged the PR. I found it very helpful and effective for Japanese translators. I’d like to continue using it.
One issue is about adding new translation rules: for example, converting English words like Collector to コレクター and Exporter to エクスポーター. This kind of change would also impact other language pages, since those English terms may appear there as well. If we seriously want to add such specific rules for Japanese, I believe we’d need to update the GitHub Actions workflow to handle it.
However, we are actually satisfied with the current rules, and we don’t need the rule regarding these word conversions for now.
If you are okay with it, could you please close this PR?
Thanks again!
Thanks all. A few comments before we close this:
- @ymotongpoo had a concern of the checks being optional. Is this still a concern, because as far as I know, the rules are always applied, right @Msksgm?
- I think that the core textlint functionality would have been sufficient, rather than bringing in the prh plugin. WDYT @Msksgm?
- If we do keep the prh plugin, I'd like to move the config file. (But that might be unnecessary if we call back on the core textlint functionality.
prh is one of the easiest way to check the consistency of Japanese. However, as @Msksgm raised, it needs to check English terms left in the Japanese translation. Thus, it would be great if we could limit the scope for prh.
Keeping prh as optional itself is totally fine to me, because still we need some flexibility for the translation.
Thank you @ymotongpoo for your comment. Ok, we'll look into:
- [ ] Scoping prh (or at least the application of it's config) to Japanese
- [ ] Relocate the prh config so that it is more conspicuous (not in a top-level prh folder -- even if simply renaming it to
.prh).
@Msksgm and @ymotongpoo - I'd rather avoid bringing the prh tool and it's textlint plugin into our tool-chain soup, if possible. Given the simple use cases you seem to have, I believe that we could use the existing textlint-rule-terminology plugin instead with this added configuration, as an example:
// .textlintrc.yml
rules:
terminology:
defaultTerms: false
skip: []
terms:
// ...
overrides:
- files: ['content/ja/**/*.md']
rules:
terminology:
terms:
- ['テレメトリ(?!ー)', 'テレメトリー']
- ['コレクタ(?!ー)', 'コレクター']
- ['メトリック', 'メトリクス']
- ['例えば', 'たとえば']
- ['全て', 'すべて']
- ['代わり', 'かわり']
- ['代わって', 'かわって']
- ['インストゥルメンテーション|インストルメンテーション|インストルメント', '計装']
- ['例示値', 'エグザンプラー']
- ['プログラマティック', 'プログラム']
- ['伝播|プロパゲーション', '伝搬']
The solution above (if it works), gives us the functionality we desire, which is to apply the Japanese rules only to the Japanese pages.
What do you think? I haven't tested this yet. Could you test that out? If not, I'll test it out when I have the time.
If you anticipate that there's more functionality that you'll need from prh, let us know.
/cc @vitorvasc @theletterf
@chalin
Thank you for your suggestion. I tried it. However, this does not work well in a below PR.
- https://github.com/open-telemetry/opentelemetry.io/pull/7671