mystmd icon indicating copy to clipboard operation
mystmd copied to clipboard

Support multi-language content and pages with translations

Open choldgraf opened this issue 9 months ago • 2 comments

For larger communities and knowledge bases with many types of readers, it is common to maintain multiple languages of the same content, and host each of them separately. This allows, for example, English speakers to access documentation at mydocs.org/en/ and French speakers to access documentation at mydocs.org/fr/

This makes it much easier for communities to make their content more accessible to a larger population of readers.

Suggested improvement

It should be possible to:

  • Maintain multiple languages of the same content
  • When a document is built, create outputs for each of those languages
  • For hosting online, have a workflow to easily host each at a different location / URL / sub-folder
  • For each, the theme's UI elements should be in the language of the content (ref: #166)

Related issues and references

  • This one covers UI and theme elements that need different languages: https://github.com/jupyter-book/mystmd/issues/166

choldgraf avatar Feb 27 '25 16:02 choldgraf

This might provide some ideas: https://teachbooks.io/manual/external/Sphinx-launch-buttons/README.html#multilingual-book

FreekPols avatar Apr 01 '25 08:04 FreekPols

Copying from: https://github.com/jupyter-book/mystmd/issues/166#issuecomment-2359594108

QuantEcon is currently undertaking a project assessing how AI can be used to assist with translations to other languages. We are exploring a range of tooling including ChatGPT, various OpenAI models, and service layers such as crowdin which has support for open-source projects.

Our vision is to develop an open-source compatible workflow that uses AI to get to the 90% mark in the translation, leaving the last 10% to human proof-reading and updates. This is the approach we are taking in our first translation project of the introductory lecture series

There are various technical aspects to this project in the medium to longer term including:

1. How to enable good version control of source material in multiple languages. It would be nice to have tooling that allows for better line-by-line level control in the translation process for bi-directional updates from, for example, English -> Chinese, or from Chinese -> English. There may not always be a single source of truth in future projects, so bi-directional updating would be really useful.

2. Enabling an interface that allows reader feedback to improve translations (that isn't a pull request)

Sounds very cool @mmcky! Is there anywhere I can read about how that's going?

da5nsy avatar Apr 01 '25 16:04 da5nsy

This might provide some ideas: https://teachbooks.io/manual/external/Sphinx-launch-buttons/README.html#multilingual-book

There's now a nice live example of this here: https://teachbooks.io/files-and-folders/EN/intro.html via https://github.com/TeachBooks/Sphinx-launch-buttons/issues/3#issuecomment-2770504348

da5nsy avatar Apr 10 '25 15:04 da5nsy

thanks @da5nsy sorry for the delay in getting back to you.

Sounds very cool @mmcky! Is there anywhere I can read about how that's going?

To deliver our current requested translations we are currently working on independent translations, translating our source notebooks using Claude AI (this time around) and then human editing. For example this is what we are working on at the moment.

https://github.com/QuantEcon/lecture-python.zh-cn

Once we get through this period of translating (using more manual workflows) we want to take what we have learnt and move to our next step is to bring more automation to the workflow either through an extension, or otherwise. We have had a look around at some of the service layers available (such as crowdin) but it doesn't work nicely with the myst markup.

mmcky avatar May 09 '25 01:05 mmcky

Thanks @mmcky. In case it's useful, here's where we're at for The Turing Way: https://book.the-turing-way.org/community-handbook/translation

da5nsy avatar May 09 '25 05:05 da5nsy

thanks for the link @da5nsy -- I will take a look for sure.

It will be great to see how you're approaching this issue, once I've read through perhaps we can link up some time to understand technical cross over and how we might be able to help each other :-).

mmcky avatar May 09 '25 05:05 mmcky

@mmcky Sounds good! We have a regular infrastructure working group meeting every second Tuesday at 1600 UTC, and a regular translation working group meeting every other Thursday at 15:00 UTC (join the corresponding Slack channels for further info)

da5nsy avatar May 09 '25 05:05 da5nsy

Just pinging the subscribers to this thread to let y'all know that I've "broken ground" on getting another language deployed on TTW: https://github.com/the-turing-way/the-turing-way/pull/4289

Input and contributions welcome!

We have an infrastructure WG meeting this afternoon at 4 UK time, please do join if folks fancy chatting synchronously about it.

da5nsy avatar Jul 08 '25 08:07 da5nsy