open-lid-dataset
open-lid-dataset copied to clipboard
Resources for Kurdish
1M tokens / 156k sentences in several varieties of Central Kurdish: https://github.com/sinaahmadi/CORDI
and a Kurdish LID models and datasets: https://github.com/sinaahmadi/KurdishLID
More: https://github.com/sinaahmadi/awesome-kurdish
KTC corpus seems useful and maybe others
Cheers Jaume, I'll look to fold it in to the next release.