mtdata icon indicating copy to clipboard operation
mtdata copied to clipboard

Parallel Corpora for 6 Indian Languages

Open kpu opened this issue 4 years ago • 2 comments

http://catalog.elra.info/en-us/repository/browse/ELRA-W0320/#

CC-BY-SA-3.0

Not sure why there isn't a download link from the main page, guess somebody needs to go in with an ELRA login, get it, and rehost.

kpu avatar Feb 22 '22 00:02 kpu

I believe this is the data that we released in this paper? In that case, there is a more direct link. I'm not sure why ELRA has appropriated it with no mention or citation.

That said, the data was translated into English by English L2 speakers. The quality isn't great, though it might serve for translating out of English.

mjpost avatar Jul 15 '22 22:07 mjpost

Wondering why ELRA didnt mention or cite the paper! The description looks a lot similar to the one described in the paper. BTW, we have already added the joshua-decoder/indian-parallel-corpora corpus ( see mtdata list -id -g JoshuaDec). https://github.com/thammegowda/mtdata/blob/b1c0b21d3b58c0053b3a6fa669158f71b0c7f0a7/mtdata/index/joshua_indian.py#L11

thammegowda avatar Jul 15 '22 22:07 thammegowda