open-lid-dataset
open-lid-dataset copied to clipboard
Add languages of the AmericasNLP MT Shared Task
The MT Shared Task on Machine Translation into Indigenous Languages covers 11 Indigenous languages of the Americas: Aymara, Bribri, Asháninka, Chatino, Guarani, Wixarika, Nahuatl, Otomí, Quechua, Shipibo-Konibo, Rarámuri. You can find data for these languages in this year's github repository.
Would it be feasible to add them to the open-lid-dataset? I would be more than happy to help to make this possible!
Thanks :)
Hello! I'm excited to hear about the task and the data available, I hope it goes well!
Currently all the languages in OpenLID are included in the FLORES+ dataset so we have a level of common evaluation. Is there any scope to translate FLORES+ into the languages you cover in your shared task? Please see OLDI for more details.
In any case, I plan to add more languages in a batch at some point this year, so thank you for letting me know about this data!
Hi Laurie, sorry for taking so much time to answer.
Currently we do not have FLORES+ translations but we finally discussed it and this may be something we could do next year. We will definitely let you know!