Scribe-Data
Scribe-Data copied to clipboard
Generate all translations for the currently supported languages [was Colab testing]
Terms
- [X] I have searched open and closed issues
- [X] I agree to follow Scribe-Data's Code of Conduct
Issue
As a part of the process to work towards multi lingual translation, we need to test running the translation processes in a
hey! I would like to help with this issue.
Hey @byt3h3ad 👋 Thanks so much for your offer to help! I'll assign you, and once we have one of the new ones finished we can get to this issue. You'd also be welcome to work on one of the translation issues as well! 😊
Hey @byt3h3ad! We finally have some of the new translation processes up and running. If you wanted to give it a shot using the scribe_data/extract_transform/languages/English/translations/translate_words.py file and document how to get it up and running, then that'd be great!
hello, @andrewtavis I run the repo in Google Colab As expected it shows same error which shows in the issue - #96
when i update the file translation_utils and put translations = [] same as -#96 , it works in Google Colab. can you please check it kindly?
Nice, @axif0! Give me a moment and to do the check here, but this is great!
Assigning you as well to show credit for the work here :)
Switching the context of this issue to generating the translations from checking out Google Colab, as as @axif0 it sounds like the processes we have written here can't be finished even using Colab GPUs. I'm going to try to run these things locally over a few nights and then we can call this issue good, as the plan is not to have this process running on machine translations in the long term. Ultimately Scribe-Data will eventually run on Wiktionary based data, so let's close this with the current rendition and then start shifting towards the new methods :)
@axif0, you were the one who'd said that the translation process didn't finish on Collab, right? Did you use GPUs for it, or just CPUs? To my memory they don't have GPUs available by default.
Thanks, @axif0! Plan is that this weekend I'll get Colab pro and run through the process :) Will update after that!
180ad64 is the result of all of the machine translations. We now need to rework the SQLite process to put them all in a TranslationData.sqlite file :)
The above commits close this 🚀 The current data process takes an extremely long time that is basically preventative for it being ran again, but then we just need this for the next release, and then we'll move on to the new translation process once Outreachy is done 😊