nlp-public-dataset
nlp-public-dataset copied to clipboard
Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集
NLP-dataset (General)
- Huggingface, datasets
- Awesome-Chinese-NLP, Chinese
- CLUEDatasetSearch, Chinese
- funNLP, Chinese
- ChineseNLPCorpus1, Chinese
- ChineseNLPCorpus2, Chinese
- CLUE, Chinese
- Chinese NLP data by ShannonAI, Chinese
- nlp-datasets, Multilingual
- awesome-nlp, Multilingual
Word Segmentation (Chinese)
NER dataset (English)
- various NER dataset
- CoNLL-2003, Offical, CoNLL-2003, other link
- WNUT-2016, Twitter
- OntoNotes-5.0, broadcase news, braodcase conversation, weblogs, magzine genre
- Wikigold
- kaggle
- MUC6
- MUC7
NER dataset (Chinese)
Machine Translation (Chinese-English)
- WMT 2020
- AI challenger (英中翻译规模最大的口语领域英中双语对照数据集)
- UM-Corpus: A Large English-Chinese Parallel Corpus
- OpenSubtitles2016
- MultiUN