multilingual_kws
multilingual_kws copied to clipboard
Dataset Code
Changes:
-
EDA
folder which has scripts for, well you guessed it, Exploratory Data Analysis -
extraction
has consolidated scripts for extraction. -
packaging
has scripts used for tarring, uploading, getting links, filtering, etc (all using multiprocessing over different languages) - There is a new function added to
embedding/word_extraction.py
but this can be consolidated into the older one. I made a new one to not disturb any existing code.
Updated .gitignore to avoid pickles, pngs, pdfs, etc.