Auto-Synced-Translated-Dubs icon indicating copy to clipboard operation
Auto-Synced-Translated-Dubs copied to clipboard

Implement DeepL translation

Open sofiadparamo opened this issue 2 years ago • 3 comments

Type of change

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] This change requires a documentation update

Proposed Changes

  • Implement DeepL translation service
    • Logic for batch translation
    • Settings.
  • Introduced Python package DeepL which is the official package for the API.
  • Documented DeepL in the README
  • Add logic to not require DeepL key if using google translate, and vice versa, not requiring Google API key or credentials if using DeepL + Azure.
  • Fix an issue with batch translation in Google API causing the array to go out of bounds due to a comparison between the chunks of text versus the whole list of texts to translate. For more info, check #26

I've extensively tested everything in this PR, I tried to cover every path and possible way for the code to run, however, since this is a big change, it would be great to test it on an environment different than mine.

Additional Info

  • Fixes #26

Checklist:

  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have tested that the code works

Required:

  • By contributing to this project, you agree to the terms of the GPLv3 license, and agree to grant the project owner the right to also provide or sell this software, including your contribution, to anyone under any other license, with no compensation to you.

sofiadparamo avatar Jan 03 '23 07:01 sofiadparamo

Cool, I don't see any obvious issues, I should be able to fully test it out in the next couple days.

Since DeepL doesn't support as many languages as Google Translate, I can add some kind of fallback to use Google Translate in the case a language isn't supported, like Arabic, Hindi, Korean, etc. Should be easy enough to just hard code a list to check against.

ThioJoe avatar Jan 03 '23 14:01 ThioJoe

Awesome!

It would also be possible to ask the API for the supported languages and do a dynamic fallback, that way it would stay updated with API changes on DeepL's side.

However, there are still some things that require attention, for example, the language en is no longer supported, only en-US or en-GB, the same applies to pt with pt-BR and pt-PT. I don't know why are they listed as backward support, but the library generates a deprecation exception when using those.

sofiadparamo avatar Jan 03 '23 20:01 sofiadparamo

That's awesome! We are using deepl to do translation of subtitles for Khan Academy, but actually run those then trough Amara.og with a manual review process. So probably will not be able to use the integration here.

Did you consider making the used translation service configurable in config.ini? E.g. change skip_translation = False to translation_service = { none, Google, Deepl}

alani1 avatar Jan 03 '23 21:01 alani1