yoruba-text
yoruba-text copied to clipboard
Provide a script to cleanly download and normalize text
Rather than the current system of each sub-corpora it is own folder with its own code. Create a top-level downloads.sh
which can re-assemble the sub-corpora.
Separately, have the downloaded & pre-processed sub-corpora ready to be referenced from ADR, and NMT repos as submodules etc.