nlp-datasets topic
chinese_medical_words
手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。
MetroTwitter
What Twitter reveals about the differences between cities and the monoculture of the Bay Area
FreebaseQA
The release of the FreebaseQA data set (NAACL 2019).
OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
infotabs-code
Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.
zi-dataset
汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。
Datasets
datasets with text data for use in NLP, Text analysis, information extraction, ML research.
benchie
Comprehensive evaluation framework for Open Information Extraction.
XCSR
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"
bilkent-turkish-writings-dataset
Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.