indicnlp_catalog
indicnlp_catalog copied to clipboard
PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation
13.5k parallel annotated sentences: https://zenodo.org/record/3605597#.YVKJ2bgzZEY
6k parallel sentences from IIIT-H: https://github.com/mrinaldhar/en-hi-codemixed-corpus
Paper: https://aclanthology.org/W18-3817.pdf
CALCS 2021 Eng-Hinglish dataset: (10k pairs) https://code-switching.github.io/2021#shared-task-1
Paper: https://arxiv.org/pdf/2202.09625.pdf
done