OPUS
OPUS copied to clipboard
WikiTitles en-ru is ru-en
I noticed weird scores while analyzing WikiTitles/v3
for en-ru
language pair. It turned out that the direction of the downloaded dataset is the opposite of the language codes:
(base) admins-MBP:data epavlov$ head WikiTitles.en-ru.en
Hijiri
Литва
Россия
Слоновые
Мамонты
Красная книга
Соционика
Школа
Лингвистика
Социология
(base) admins-MBP:data epavlov$ head WikiTitles.en-ru.ru
Hijiri
Lithuania
Russia
Elephantidae
Mammoth
IUCN Red List
Socionics
School
Linguistics
Sociology
https://opus.nlpl.eu/WikiTitles/en&ru/v3/WikiTitles
Oh, that's bad. Do you know whether many other language pairs are affected in the same way? I need to look into this. Thanks for noting!
It doesn't show in the UI what other languages are supported, but English to Czech looks correct. I guess something got broken for this dataset: