corpora topic
self_dialogue_corpus
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
fuzzdata
Fuzzing resources for feeding various fuzzers with input. 🔧
indicnlp_catalog
A collaborative catalog of NLP resources for Indic languages
lm-spanish
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
nltk_data
NLTK Data
gensim-data
Data repository for pretrained NLP models and NLP corpora.
opencorpora
A web-based engine for creating and annotating textual corpora