Anoop Kunchukuttan

Results 93 issues of Anoop Kunchukuttan

https://github.com/neulab/covid19-datashare

wontfix

- ITRANS, IAST, WX, BrahmiNet and other romanization standards. - IPA-Indic scripts standard from IIT Madras - BIS Postag set

https://www.kaggle.com/disisbig/datasets

SUPARA0.8M: A BALANCED ENGLISH-BANGLA PARALLEL CORPUS https://ieee-dataport.org/documents/supara08m-balanced-english-bangla-parallel-corpus around 20k sentences

wontfix

[Semantic Relatedness dataset](https://arxiv.org/abs/2402.08638) 4 Indic languages: hin, mar, pan, tel. 300-1000 sentences pairs in testset

Paper: https://arxiv.org/abs/2312.11361 Repo: https://github.com/project-miracl/nomiracl Languages: bn, hi, te * A testset to evaluate whether a paragraph contains an answer to a query * Use for evaluating hallucinations and error-rates in...

https://arxiv.org/abs/2305.08828 14 languages, for testing

Classification datasets for 10 Indian languages based on news articles. More challenging than existing sets. [Paper](https://arxiv.org/abs/2401.02254) [Repository](https://github.com/l3cube-pune/indic-nlp)

Chandamama Dataset and Small LLM for Telugu (upcoming) https://www.linkedin.com/feed/update/urn:li:activity:7147934433545773056/ Dataset: https://huggingface.co/datasets/swechatelangana/chandamama-kathalu