yoruba-text
yoruba-text copied to clipboard
Yorùbá language training text for NLP, ASR and TTS tasks
Results
5
yoruba-text issues
Sort by
recently updated
recently updated
newest added
Add https://oscar-corpus.com, common crawl from the BBC to the working corpus for ADR and other monolingual tasks ``` Language | Words original | Size original | File original | Words...
enhancement
Rather than the current system of each sub-corpora it is own folder with its own code. Create a top-level `downloads.sh` which can re-assemble the sub-corpora. Separately, have the downloaded &...
enhancement
help wanted
good first issue
Right now OCR texts (Aaro Meta, Ogboju, etc) suffer from errors intrinsic to the OCR process (non Yorùbá characters, inconsistencies from sentence to sentence that don't reflect a human's authors...
bug
1) Alaroye https://alaroye.tv/category/iroyin 2) https://www.bbc.com/yoruba
enhancement
help wanted