Scribe-Data icon indicating copy to clipboard operation
Scribe-Data copied to clipboard

Generate all translations for the currently supported languages [was Colab testing]

Open andrewtavis opened this issue 1 year ago • 7 comments
trafficstars

Terms

Issue

As a part of the process to work towards multi lingual translation, we need to test running the translation processes in a

andrewtavis avatar Feb 24 '24 13:02 andrewtavis

hey! I would like to help with this issue.

byt3h3ad avatar Feb 27 '24 23:02 byt3h3ad

Hey @byt3h3ad 👋 Thanks so much for your offer to help! I'll assign you, and once we have one of the new ones finished we can get to this issue. You'd also be welcome to work on one of the translation issues as well! 😊

andrewtavis avatar Feb 28 '24 08:02 andrewtavis

Hey @byt3h3ad! We finally have some of the new translation processes up and running. If you wanted to give it a shot using the scribe_data/extract_transform/languages/English/translations/translate_words.py file and document how to get it up and running, then that'd be great!

andrewtavis avatar Mar 18 '24 01:03 andrewtavis

hello, @andrewtavis I run the repo in Google Colab As expected it shows same error which shows in the issue - #96

when i update the file translation_utils and put translations = [] same as -#96 , it works in Google Colab. can you please check it kindly?

image

axif0 avatar Jul 16 '24 14:07 axif0

Nice, @axif0! Give me a moment and to do the check here, but this is great!

andrewtavis avatar Jul 16 '24 17:07 andrewtavis

Assigning you as well to show credit for the work here :)

andrewtavis avatar Jul 16 '24 17:07 andrewtavis

Switching the context of this issue to generating the translations from checking out Google Colab, as as @axif0 it sounds like the processes we have written here can't be finished even using Colab GPUs. I'm going to try to run these things locally over a few nights and then we can call this issue good, as the plan is not to have this process running on machine translations in the long term. Ultimately Scribe-Data will eventually run on Wiktionary based data, so let's close this with the current rendition and then start shifting towards the new methods :)

andrewtavis avatar Aug 11 '24 20:08 andrewtavis

@axif0, you were the one who'd said that the translation process didn't finish on Collab, right? Did you use GPUs for it, or just CPUs? To my memory they don't have GPUs available by default.

andrewtavis avatar Sep 02 '24 21:09 andrewtavis

@andrewtavis i use TPU v2.8. Also the notebook is here Google Colab.

image

And sorry for late reply. :(

axif0 avatar Sep 11 '24 17:09 axif0

Thanks, @axif0! Plan is that this weekend I'll get Colab pro and run through the process :) Will update after that!

andrewtavis avatar Sep 11 '24 18:09 andrewtavis

180ad64 is the result of all of the machine translations. We now need to rework the SQLite process to put them all in a TranslationData.sqlite file :)

andrewtavis avatar Sep 15 '24 19:09 andrewtavis

The above commits close this 🚀 The current data process takes an extremely long time that is basically preventative for it being ran again, but then we just need this for the next release, and then we'll move on to the new translation process once Outreachy is done 😊

andrewtavis avatar Sep 15 '24 20:09 andrewtavis