indicnlp_catalog icon indicating copy to clipboard operation
indicnlp_catalog copied to clipboard

PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation

Open GokulNC opened this issue 2 years ago • 2 comments

13.5k parallel annotated sentences: https://zenodo.org/record/3605597#.YVKJ2bgzZEY

GokulNC avatar Sep 28 '21 03:09 GokulNC

6k parallel sentences from IIIT-H: https://github.com/mrinaldhar/en-hi-codemixed-corpus

Paper: https://aclanthology.org/W18-3817.pdf

GokulNC avatar Sep 28 '21 03:09 GokulNC

CALCS 2021 Eng-Hinglish dataset: (10k pairs) https://code-switching.github.io/2021#shared-task-1

Paper: https://arxiv.org/pdf/2202.09625.pdf

GokulNC avatar Sep 28 '21 03:09 GokulNC

done

anoopkunchukuttan avatar Aug 14 '22 19:08 anoopkunchukuttan