Anoop Kunchukuttan

Results 33 comments of Anoop Kunchukuttan

The version of IndicCorpus does not contain Oscar. However, the newer version that you can find here contains OSCAR as a subset - https://indicnlp.ai4bharat.org/corpora/

https://www.iitm.ac.in/donlab/tts/downloads/cls/cls_v2.1.6.pdf

This is the TICO-19 dataset. Very small. Paper: https://arxiv.org/pdf/2007.01788.pdf

Does not look to be publicly available, will include in repo later.

Thanks, yes beam search with attention works for batch size of 1. Do you plan to support larger batch sizes soon? That would be really useful.

Summarizing WikiLingua - Cross-lingual summarization dataset created from WikiHow - Contains article-summary pairs from pairing English article and summary in another language (and vice-versa) - Among Indian languages, Hindi is...

English WikiHow: https://github.com/mahnazkoupaee/WikiHow-Dataset

100 documents from 10 topics, translated from English.

Check later MultiLing workshops as well https://aclanthology.org/W19-8901.pdf See all proceedfings here: http://multiling.iit.demokritos.gr/pages/view/1616/multiling-2017

https://aclanthology.org/W19-8903/ - Summary Evaluation - judgments across languages