corpora topic

List corpora repositories

self_dialogue_corpus

105
Stars
25
Forks
Watchers

The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports

entity-recognition-datasets

1.4k
Stars
243
Forks
Watchers

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

fuzzdata

463
Stars
119
Forks
Watchers

Fuzzing resources for feeding various fuzzers with input. 🔧

indicnlp_catalog

527
Stars
76
Forks
Watchers

A collaborative catalog of NLP resources for Indic languages

lm-spanish

244
Stars
21
Forks
Watchers

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

gensim-data

953
Stars
127
Forks
Watchers

Data repository for pretrained NLP models and NLP corpora.

opencorpora

238
Stars
23
Forks
Watchers

A web-based engine for creating and annotating textual corpora

corus

272
Stars
20
Forks
Watchers

Links to Russian corpora + Python functions for loading and parsing