low-resource-languages topic
calamanCy
NLP pipelines for Tagalog using spaCy
Semi-Supervised-NMT-for-Sumerian-English
Exploring the Limits of Low-Resource Neural Machine Translation
GlotLID
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
Turkish-Speech-to-Text
Fine-tuning for automatic speech recognition on low-resource languages with character-based CTC model
BembaSpeech
This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/TV shows transcripts, Youtube Video transcripts, Online sources...
vad-sli-asr
A pipeline to isolate and transcribe one language in mixed-language speech
thesis
My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
relm_unmt
Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".