OPUS icon indicating copy to clipboard operation
OPUS copied to clipboard

Thai is missing in NLLB

Open gregtatum opened this issue 6 months ago • 1 comments

I see the language mentioned in the NLLB paper, but I don't see it in the datasets:

https://arxiv.org/pdf/2207.04672

https://opus.nlpl.eu/results/en&th/corpus-result-table

gregtatum avatar Jul 11 '25 18:07 gregtatum

Interesting - I took the data from https://huggingface.co/datasets/allenai/nllb and Thai seems to be missing there as well: https://huggingface.co/datasets/allenai/nllb/blob/main/nllb_lang_pairs.py I don't really know what to do about it.

jorgtied avatar Nov 27 '25 06:11 jorgtied