website icon indicating copy to clipboard operation
website copied to clipboard

Adopt Translation & Localization Management Platform for docs

Open MaxymVlasov opened this issue 1 year ago • 47 comments

This is a Feature Request

Support of Translation & Localization Management Platform (TLMP) for docs

To effectively make and maintain translations, we need to adopt tools that were created for it. There a bunch of solutions, like Transifex, Crowdin, and others.

After a quick research at the start of 2023, we chose Transifex for PoC as a free and reliable solution, but in the end, it's no matter what tool sig-docs chooses, it will be much better than the workflow that we have now via git&Github.

What would you like to be added

There are 3 ways, from better to worse:

  1. Full integration. Integrate any TLMP with k/websites and move localization work fully to the chosen TLMP, including docs reviews and CLA sign. Add TLMP Github integration and auto-merge changes sent by TLMP integration user (provided by TLMP or you can setup your own)
  2. Particial integration. Translate and review in TLMP, but users still need to sign CLA in Github
    1. Fix the EasyCLA co-author issue, mentioned in Linux Foundation issue, test PR
    2. Create a script that will be take by API's last editors of each line for the changed file and add them as co-authors in git
    3. Auto-merge such changes if EasyCLA check passes, (review already done in TLMP)
  3. No integration. Same as p.2, but 2.3. replaced with: 3. Manually approve and merge such PRs in Github, if EasyCLA check passes.

Why is this needed

When you write docs or code in 1 language, you work only with the current state of docs - everything that you need to track is tracked by git. When you translate something - you deal with 2 sources of truth: original and existent translation, which can be partial, outdated or placed in locations that have already been moved/removed from the original. + other edge cases. There is no easy way in git to check what changes in the original also should be revisited in translation, so mostly every translated doc became unsupported from the moment when it was merged - as sometimes simpler to redo a translation from scratch than to figure out what changes are needed.

Long story short: git is just the wrong tool for translations. It is as bad for this as using .zips for VCS or trying to send a letter by pigeon mail to another continent and hoping for 3 workday answer.

Also, tech-writers, students, or newbies which'd like to contribute, in most cases have no or little knowledge how git and GitHub works, they don't have a GitHub account, and so on. Those are just not intuitive tools for non-techie folks.

So, what good enough TLMP will provide:

  • Tracking changes in the original and asking for new translations for these changes
  • Provide suggested translation by Google Translate/DeepL/etc by 1 click, which should be reviewed at least twice: by a user who added that auto-translation and by the reviewer(s)
  • Have built-in CLA support
  • Have a Github integration
  • Have an easy to understanding for translators interface
  • Provide easy-to-find not translated, and not reviewed strings/files
  • Able to provide/keep a few translation suggestions for a single string for the reviewer choose
  • Easy to signup and sign CLA from the start mechanism
  • Conform with all LF and CNCF regulations (or change those regulations to conform with TLMP)
  • ... ?

Comments @sftim asked to add it as an issue here, to be able to track work on it.

Related Linux Foundation issue

Screenshots of the LF issue in case you can't see it

Read msgs from bottom to top Screenshot from 2024-02-16 19-35-04

Screenshot from 2024-02-16 19-34-48

Screenshot from 2024-02-16 19-34-31

P.S. That started as a Ukrainian localization team initiative, but we were blocked from the legal perspective of merging back that kind of change from TLMP.

MaxymVlasov avatar Feb 16 '24 19:02 MaxymVlasov

/area localization /priority important-longterm /triage accepted /kind feature

sftim avatar Feb 17 '24 16:02 sftim

For Transifex: if Transifex commits a change, is there any license or copyright asserted by Transifex?

(if not: I think we could make a tool that allows contributors to adopt those changes and confirm, through the tool, that their CLA applies to the contribution(s) in that commit).

sftim avatar Feb 17 '24 16:02 sftim

Consider integrating Crowdin (https://support.crowdin.com/enterprise/authentication-settings/) along with Transifex for translation purposes. Crowdin offers a comprehensive set of authentication settings, as illustrated in the image below:

Crowdin Authentication Methods

Additionally, Crowdin provides a versatile range of translation tools, enhancing the efficiency and engagement of contributors in the translation process. The image below showcases the various tools available:

Crowdin Translation Tools

Encouraging the use of Computer Assisted Translation (CAT) tools like Transifex and Crowdin can significantly streamline the translation process and boost overall contributor engagement. It would be beneficial for CNCF to explore and embrace these tools for a more efficient and collaborative translation experience.

Andygol avatar Feb 18 '24 14:02 Andygol

I like it. Translation memory is always asset!

How about to introduce chatGPT that can provide better results than traditional machine translations.

windsonsea avatar Feb 19 '24 11:02 windsonsea

Let's have one issue per change we need to make; if we'd like to have a wider discussion, see:

as repo discussions

sftim avatar Feb 19 '24 13:02 sftim

@MaxymVlasov & colleagues: do you have a reply for https://github.com/kubernetes/website/issues/45175#issuecomment-1950248435 ?

sftim avatar Feb 19 '24 13:02 sftim

Hello @sftim .

if Transifex commits a change, is there any license or copyright asserted by Transifex?

According to https://www.transifex.com/legal/terms/ the answer for quoted question is

Transifex does not assert any ownership rights over the content a person submit, including text published via git repositories for localization. The terms specify that while using Transifex services, you retain full ownership of your content ("your words"), and Transifex only requires limited rights to perform the services requested by localizator\customer. This includes actions like hosting, sharing, or otherwise processing your content as directed by you, without claiming any copyright or license over it​

сс @MaxymVlasov @Andygol

OleksaBaida avatar Feb 19 '24 14:02 OleksaBaida

@sftim pretty similar conclusion for CrowdIn mentioned by @Andygol https://support.crowdin.com/terms/

Clients are responsible for ensuring they have the necessary rights to their content. Crowdin asserts no ownership over client data, which includes text submitted for translation or any other purpose

cc @MaxymVlasov

OleksaBaida avatar Feb 19 '24 14:02 OleksaBaida

For a discussing choice of tools, please go to: Translation tooling for Kubernetes localization

sftim avatar Feb 19 '24 14:02 sftim

OK, so how about we document a workflow (I'm thinking git rebase) where people adopt the changes that Transifex has committed, and then we look at automating away the toil from the manual process.

Does that sound OK?

sftim avatar Feb 19 '24 14:02 sftim

@sftim

a workflow (I'm thinking git rebase) where people adopt the changes that Transifex has committed

The main blocker for a reasonable flow with Transifex is that easyCLA unable to process commits that include co-authors. if general, non-proven workflow description is good enough for the moment - we willprovide it

OleksaBaida avatar Feb 19 '24 14:02 OleksaBaida

Broadly, here's what I suggest:

  • have a branch
  • get Transifex to commit to that branch
  • pull the commits
  • amend the commit author or reset to the parent commit and make a new commit
  • git push --force-with-lease

If we make a script, it would be a script to do the equivalent steps.

sftim avatar Feb 19 '24 14:02 sftim

Why there is out of radar question about signing translation work with CLA inside Transifex/Crowdin/etc…?

And here the work of not only translators, but also reviewers and approvers, who don't have to worry about the complexities of managing content using git commands, is made easier.

Andygol avatar Feb 19 '24 14:02 Andygol

@sftim when Transifex (or any other LMP) suggest commit to pull it in main branch, a commit could already contain a result of a team work, not one author. Already reviewed and approved by owners on a localization management platform side.

Such way of managing of work progress saves a lot of time and effort and increase translation quality dramatically

so our plan is something like

  • have a branch in /kubernetes-i18n-ukrainian/website
  • get Transifex to commit to that branch
  • pull a commit
  • use Transifex CLI\API to get list of all people related to the final translation in a commit
  • amend all related to the commit as co-author (as commit is already owned by Transifex bot, which should be approved by EasyCLA bot)
  • push the commit to /k/website
  • easyCLA recognizes all authors and is good with it

UPD: This describes a way, where all authors signed CLA with easyCLA bot, not in localization management platform, it's closer to the current situation. If Legal and management agree to that localizators could sign CNCF CLA at platform (Transifex allows it for example) it would unblock process even more

OleksaBaida avatar Feb 19 '24 14:02 OleksaBaida

I try to rephrase my question.

❓ Must we track the original contributors to translations on GitHub, or is it possible to accomplish the same task using a different platform, such as a localization platform?

Andygol avatar Feb 19 '24 14:02 Andygol

When we localize manually, we don't list the original contributors as co-authors (and that's fine). No need to change when automating more of the process.

sftim avatar Feb 19 '24 16:02 sftim

I'd say we take it as implied that localized text has the upstream (English) contributors as co-authors.

sftim avatar Feb 19 '24 16:02 sftim

  • have a branch in /kubernetes-i18n-ukrainian/website

  • get Transifex to commit to that branch

  • pull a commit

  • use Transifex CLI\API to get list of all people related to the final translation in a commit

  • amend all related to the commit as co-author (as commit is already owned by Transifex bot, which should be approved by EasyCLA bot)

  • push the commit to /k/website

  • easyCLA recognizes all authors and is good with it

Although you're welcome to work in kubernetes-i18n-ukrainian/website, we'd prefer to support you so that this work happens within the Kubernetes organizations on GitHub. If you (the Ukrainian team) feel that there are barriers to working within Kubernetes, please tell us about what feels difficult. We'd like to address those barriers rather than ignore them.

Beyond that, change one more detail, and it could work:

-amend all related to the commit as co-author (as commit is already owned by Transifex bot, which should be approved by EasyCLA bot)
+change that combined commit to have the pull request submitter as the primary author, and list all other human coauthors as co-authors

sftim avatar Feb 19 '24 16:02 sftim

If Legal and management agree to that localizators could sign CNCF CLA at platform (Transifex allows it for example) it would unblock process even more

I'm pretty sure we'd need people to sign with the Linux Foundation, or for their employers to sign (again, via LF). Using Transifex for that signing does not sound feasible; the CNCF CLA signing and tracking is something that applies across the Linux Foundation.

sftim avatar Feb 19 '24 16:02 sftim

As another take on it: does Transifex provide a way for us to only accept translations from users who have signed the CLA at the Linux Foundation, and to reject work from anyone who doesn't have a current CLA?

sftim avatar Feb 19 '24 16:02 sftim

We need to watch out for tainting: if a commit that adds or updates a localized document has any change that isn't covered by a CLA, we can't accept that work. Even if most of the work was done by other people.

sftim avatar Feb 19 '24 16:02 sftim

As another take on it: does Transifex provide a way for us to only accept translations from users who have signed the CLA at the Linux Foundation, and to reject work from anyone who doesn't have a current CLA?

It provides ability to force to sign CLA provided by org owner before any work will be started. If EasyCLA will be integrated with CLA in Transifex/etc - then answer is yes.

Then we can avoid any CLA verifications and commit co-authors manipulations on git side, as it will be done during sign-up procedure in choosed TLMP

MaxymVlasov avatar Feb 19 '24 16:02 MaxymVlasov

The best we can do then is to ask people to confirm that they have signed the CLA and require this confirmation. We can't rely on the signature made within Transifex, and I don't expect that an EasyCLA integration is going to happen.

Folks are welcome to ask the CNCF for this, but let's build something that'll work even if we don't get it.

In other words, we don't ask people to use Transifex to sign the CLA. We ask people to use Transifex to formally confirm that, if we check their CLA status, it will show up as signed.

Then we can avoid any CLA verifications and commit co-authors manipulations on git side, as it will be done during sign-up procedure in choosed TLMP

Although it's a nice idea, I also think it's safe to assume that it won't happen; I can't picture a source of money or resources to make it work how we'd like.

What we can do is start with a manual process like I've outlined, and then automate that more.

sftim avatar Feb 19 '24 17:02 sftim

Hi everyone – I wanted to update this discussion with a couple of things so that we can track progress on some of this work.

SIG Docs leadership is trying to have the LF/CNCF make and communicate a decision about the following: Are they willing to do a foundation policy change where external tools can do CLA signing, bypassing EasyCLA. This is an ask that the Ukrainian Localization Team has as part of the workflow they envision when adopting a translation and localization platform. Maksym has already raised an LF Service Desk ticket for this, and I have personally reached out to/tagged Robert Reeves from the Linux Foundation, and most recently @jeefy, Head of Projects for CNCF.

In parallel, I know that @sftim is looking to work with others on starting a PoC on what a workflow could look like for an adopted translation/localization platform, but one that specifically doesn't include an EasyCLA integration – this is because we have yet to receive any decision or confirmation on the above ask, and if the policy will not be changed, we can still look into improving the localization workflow for language teams.

Finally, I'll be looking to connect with the Steering Committee (via our liaison @justaugustus) so that whatever decision is recorded is done so for the whole project.

natalisucks avatar Feb 26 '24 13:02 natalisucks

Thank you @natalisucks for doing the heavy lifting! I just have a couple of requests for the effort that @sftim plans to collaborate on:

  • Please ensure that everything is off a public fork for the purposes of visibility and transparency. A private fork will limit how we communicate the results of the experiment to the LF/CNCF.
  • I'd also like us to open a tracking issue specifically for the experiment so that we can document the steps and the progress. We can link it to this one as a child issue.

divya-mohan0209 avatar Feb 26 '24 13:02 divya-mohan0209

:coffee: Okay, here are my thoughts/questions:

  • Is the intended workflow meant to amend/update an Open PR with translations? Or will the automation create a follow-up PR with the translations for intended languages? (Note: Please say the latter)
  • If the tooling is creating follow-up PRs to something already merged in, said-tooling does not need to do a CLA-check. We already do that check up front, no reason to duplicate it.
  • We have numerous bots/automation throughout our projects. Those bots all have CLAs signed by their creator-or-representative. Since this is a third-party-tool and not a community-built tool, there are some differences there. I'll try to get clarification internally about whether this is OK or not (And yes I will update this thread when I get guidance)
  • Public Forks Please :pray: We default to everything in the open wherever possible.
  • +1 to Divya's comment. Once the bikeshedding is done here as to what this looks like, a new thread should be created that outlines the steps and implementation details.

jeefy avatar Feb 26 '24 15:02 jeefy

If the tooling is creating follow-up PRs to something already merged in, said-tooling does not need to do a CLA-check. We already do that check up front, no reason to duplicate it.

It's a follow up in that the English content has merged, but the (eg) Ukrainian content may well not have (or there may be a substantial update). We've so far treated localization work as copyrightable.

sftim avatar Feb 26 '24 15:02 sftim

We've so far treated localization work as copyrightable.

Authors maintain the copyright since this repo's licensed under CC 4.0. Which makes sense. And with Transifex, they clearly and definitively say that the author retains the rights.

To that end, I would move forward with getting a proof of concept in place (with Transifex or something else) that submits PRs and ensure actual-humans still review the content. But, it cannot be put into "production" or have any of said-PRs merged in yet (If I could color this red I would)

If a Bot submits the the PR, who owns the copyright? 

The bot? 
The owner of the bot? 
The person who configured the bot? 
Or the person who approves/merges the PR?

Before we merge any of the automated-translations, those questions need to be answered by the legal committee. I've escalated it up (See this reference) but fair warning we're in the shadow of KubeCon, the legal committee won't likely get to this till April. Just setting an expectation there.

Hopefully this unblocks y'all a bit.

jeefy avatar Feb 26 '24 17:02 jeefy

@jeefy I would appreciate it if you take a look at this discussion as well — https://github.com/kubernetes/website/discussions/45209

Andygol avatar Feb 26 '24 18:02 Andygol

In parallel, I know that @sftim is looking to work with others on starting a PoC on what a workflow could look like for an adopted translation/localization platform, but one that specifically doesn't include an EasyCLA integration

@natalisucks easypeasy! without EasyCLA compliance confirmation it could take like no days to start work on translation of a source stored on git last year we stuck on EasyCLA bot, which still needs to be changed to move forward

The only point that considers me is trial session is limited in time ((after trial we need to switch to a paid plan or complain to a platform "agreement" as open source)), so it would be great to have all volunteers to work on localization via Transifex (or any other TLMP) before the start

So we could go through the test and collect cons\pros opinions as fast as possible

Let's set the deadline and collect list of volunteers from different localization teams

PS

and I have personally reached out to/tagged Robert Reeves from the Linux Foundation

Ukrainian Localization team had personal reached Robert Reeves before raising AGAIN this question again with SIG Docs leadership to clarify that we have some support on CNCF side this time for changes to improve, modernize, streamline localization process this time

@MaxymVlasov I think it could be a great moment to summarize what we've discussed, what we have on the table at the moment and etc.

OleksaBaida avatar Feb 26 '24 21:02 OleksaBaida