opentelemetry.io icon indicating copy to clipboard operation
opentelemetry.io copied to clipboard

[i18n] Tooling to detect inconsistent translation and formatting Japanese text

Open svrnm opened this issue 1 year ago • 3 comments

@svrnm @chalin Thank you for supporting each language localization! At least, the localization for Japanese could make a smooth start thanks to the great automation tools.

Now I started feeling like I need some automation tool to detect inconsistent translation and Japanese formatting to have more contributions from many people. How can I add that feature to the repo? Tricky parts are:

  • Only lang:ja labeled PRs need the
  • The tool needs to have a set of mappings between correct and incorrect wordings (in some file)

What do you think about this idea?

Originally posted by @ymotongpoo in https://github.com/open-telemetry/opentelemetry.io/discussions/4302#discussioncomment-9870096

svrnm avatar Jun 25 '24 12:06 svrnm

hey @ymotongpoo, I copied your comment from the discussion to a dedicated issue because this is a scoped task.

We have tooling for that for English content already, but turned it off for translations, since it requires additional language-specific configuration. So here is what we need to do:

  • for each tool create a configuration for Japanese that is only applied to files in content/ja
  • enable them locally and test those tools if they work properly
  • enable them in the GitHub actions workflows

The tools are:

  • [ ] cspell for spell checking
  • [ ] prettier for formatting
  • [ ] textlint (especially "terminology") for consistent translations

We should address them 1-by-1

svrnm avatar Jun 25 '24 13:06 svrnm

@svrnm thank you for routing the topic to the issue.

  • cspell doesn't work well for Japanese and for this purpose.
  • it is said prettier 3.0.0+ works well with Japanese (ref: https://github.com/prettier/prettier/pull/11597)
  • textlint is the most fitting and common tool for this kind of Japanese document. (ref: an example for Japanese technical magazine https://gist.github.com/inao/f55e8232e150aee918b9)

ymotongpoo avatar Jun 26 '24 01:06 ymotongpoo

* cspell doesn't work well for Japanese and for this purpose.

I was not aware of that, I did a quick check and indeed it only supports English so far, that's interesting. Do you have any recommendations for a spell checker that works well for Japanese?

* it is said prettier 3.0.0+ works well with Japanese (ref: [Markdown[next branch]: Do not insert spaces between Chinese/Japanese & latin letters prettier/prettier#11597](https://github.com/prettier/prettier/pull/11597))

If it works well, then let's have a PR to introduce it for Japanese. If you provide the language-specific config (if needed/where needed), @chalin or I can handle the rest

* textlint is the most fitting and common tool for this kind of Japanese document. (ref: an example for Japanese technical magazine [gist.github.com/inao/f55e8232e150aee918b9](https://gist.github.com/inao/f55e8232e150aee918b9))

That's great to hear! Again, if you can provide us with some guidance (via a PR) with the required config etc, we can take a look into integrating it into the workflows

svrnm avatar Jun 26 '24 11:06 svrnm

@svrnm cc @open-telemetry/docs-ja-approvers

Hello.

To make my Japanese translation more efficient, I created a otel-localiztion-ja.yml(a prh file) and procedures in a bellow directory.

https://github.com/Msksgm/textlint-rule-preset-unofficial-otel-localization-ja

If I run npm run check:text with the otel-localiztion-ja.yml(a prh file), we can see below results we should fix. I hope that it will be included in open-telemetry/opentelemetry.io because it will be revised other approvers and use by other Japanease commiters. Can I create a PR to add this repository? I assume that this command will be run in a github actions' lint.

Image

Msksgm avatar Jun 19 '25 12:06 Msksgm

That's great! ... if @open-telemetry/docs-ja-approvers agree with the rules in https://github.com/Msksgm/textlint-rule-preset-unofficial-otel-localization-ja/blob/main/otel-localiztion-ja.yml.

chalin avatar Jun 19 '25 16:06 chalin

@chalin cc @open-telemetry/docs-ja-approvers

That's a really good point. I understand that my rule is not enough. When I created a PR, I want a lot of advices from @ymotongpoo and @katzchang. This is because they are professional Japanese translators in OTel and o11y. They also translate o11y books into Japanese like bellow books.

  • https://www.oreilly.co.jp/books/9784814400126/
  • https://www.oreilly.co.jp/books/9784814401031/
  • https://www.oreilly.co.jp/books/9784814401024/

Msksgm avatar Jun 19 '25 22:06 Msksgm

@Msksgm thank you, Masaki, for adding this! this feature has been longed for for a while and we're really happy with this PR.

@chalin the current rules are very basic ones and those we've pointed out repeatedly. my concern from another aspect is if it's possible to make this Japanese textlint check optional? As we grow the rule file, we should see some cases where the use of some words are correct in that case but the textlint warns. Still we're able to let the contributor confirm the suggestion by textlint, even if we set that checker optional. WDYT?

ymotongpoo avatar Jun 19 '25 23:06 ymotongpoo

Yes it is possible to disable text lint in some cases, with special comments. (From memory, I think that we mapped it to the prettier-ignore directives.) I'll be OOO, and will let @svrnm or @theletterf follow through if you need further guidance before I'm back.

chalin avatar Jun 20 '25 12:06 chalin

@chalin @svrnm @theletterf @open-telemetry/docs-ja-approvers I created a bellow PR experimentally. Please confirm it seems good or not.

https://github.com/open-telemetry/opentelemetry.io/pull/7160

@katzchang @ymotongpoo

I want to add more words in prh/ja.yml. Please review it.

Msksgm avatar Jun 21 '25 06:06 Msksgm

apologies for the delay here, this is really some great work, I'll take a look into the PR right now!

svrnm avatar Jul 16 '25 09:07 svrnm

@svrnm

Thank you for your review. I will confirm how it is effective for our Japanese translation after one month.

Msksgm avatar Jul 16 '25 11:07 Msksgm

thanks @Msksgm, I'll keep this issue open and we revisit it in a month from now

svrnm avatar Jul 16 '25 12:07 svrnm

Hi @svrnm,

It has been about a month since I merged the PR. I found it very helpful and effective for Japanese translators. I’d like to continue using it.

One issue is about adding new translation rules: for example, converting English words like Collector to コレクター and Exporter to エクスポーター. This kind of change would also impact other language pages, since those English terms may appear there as well. If we seriously want to add such specific rules for Japanese, I believe we’d need to update the GitHub Actions workflow to handle it.

However, we are actually satisfied with the current rules, and we don’t need the rule regarding these word conversions for now.

If you are okay with it, could you please close this PR?

Thanks again!

Msksgm avatar Aug 20 '25 05:08 Msksgm

Thanks all. A few comments before we close this:

  • @ymotongpoo had a concern of the checks being optional. Is this still a concern, because as far as I know, the rules are always applied, right @Msksgm?
  • I think that the core textlint functionality would have been sufficient, rather than bringing in the prh plugin. WDYT @Msksgm?
  • If we do keep the prh plugin, I'd like to move the config file. (But that might be unnecessary if we call back on the core textlint functionality.

chalin avatar Aug 28 '25 00:08 chalin

prh is one of the easiest way to check the consistency of Japanese. However, as @Msksgm raised, it needs to check English terms left in the Japanese translation. Thus, it would be great if we could limit the scope for prh.

Keeping prh as optional itself is totally fine to me, because still we need some flexibility for the translation.

ymotongpoo avatar Aug 28 '25 15:08 ymotongpoo

Thank you @ymotongpoo for your comment. Ok, we'll look into:

  • [ ] Scoping prh (or at least the application of it's config) to Japanese
  • [ ] Relocate the prh config so that it is more conspicuous (not in a top-level prh folder -- even if simply renaming it to .prh).

chalin avatar Aug 28 '25 19:08 chalin

@Msksgm and @ymotongpoo - I'd rather avoid bringing the prh tool and it's textlint plugin into our tool-chain soup, if possible. Given the simple use cases you seem to have, I believe that we could use the existing textlint-rule-terminology plugin instead with this added configuration, as an example:

// .textlintrc.yml
rules:
  terminology:
    defaultTerms: false
    skip: []
    terms:
    // ...

overrides:
  - files: ['content/ja/**/*.md']
    rules:
      terminology:
        terms:
          - ['テレメトリ(?!ー)', 'テレメトリー']
          - ['コレクタ(?!ー)', 'コレクター']
          - ['メトリック', 'メトリクス']
          - ['例えば', 'たとえば']
          - ['全て', 'すべて']
          - ['代わり', 'かわり']
          - ['代わって', 'かわって']
          - ['インストゥルメンテーション|インストルメンテーション|インストルメント', '計装']
          - ['例示値', 'エグザンプラー']
          - ['プログラマティック', 'プログラム']
          - ['伝播|プロパゲーション', '伝搬']

The solution above (if it works), gives us the functionality we desire, which is to apply the Japanese rules only to the Japanese pages.

What do you think? I haven't tested this yet. Could you test that out? If not, I'll test it out when I have the time.

If you anticipate that there's more functionality that you'll need from prh, let us know.

/cc @vitorvasc @theletterf

chalin avatar Aug 28 '25 20:08 chalin

@chalin

Thank you for your suggestion. I tried it. However, this does not work well in a below PR.

  • https://github.com/open-telemetry/opentelemetry.io/pull/7671

Msksgm avatar Aug 29 '25 22:08 Msksgm