OPUS icon indicating copy to clipboard operation
OPUS copied to clipboard

WikiTitles en-ru is ru-en

Open eu9ene opened this issue 10 months ago • 2 comments

I noticed weird scores while analyzing WikiTitles/v3 for en-ru language pair. It turned out that the direction of the downloaded dataset is the opposite of the language codes:

(base) admins-MBP:data epavlov$ head WikiTitles.en-ru.en 
Hijiri 
Литва 
Россия 
Слоновые 
Мамонты 
Красная книга 
Соционика 
Школа 
Лингвистика 
Социология 

(base) admins-MBP:data epavlov$ head WikiTitles.en-ru.ru 
Hijiri 
Lithuania 
Russia 
Elephantidae 
Mammoth 
IUCN Red List 
Socionics 
School 
Linguistics 
Sociology 

https://opus.nlpl.eu/WikiTitles/en&ru/v3/WikiTitles Screenshot 2024-04-25 at 2 31 13 PM

eu9ene avatar Apr 25 '24 21:04 eu9ene

Oh, that's bad. Do you know whether many other language pairs are affected in the same way? I need to look into this. Thanks for noting!

jorgtied avatar Apr 29 '24 08:04 jorgtied

It doesn't show in the UI what other languages are supported, but English to Czech looks correct. I guess something got broken for this dataset:

Screenshot 2024-04-29 at 10 47 01 AM

eu9ene avatar Apr 29 '24 17:04 eu9ene