corpora topic

List corpora repositories

CorpusLoaders.jl

32
Stars
13
Forks
Watchers

A variety of loaders for various NLP corpora.

EVALution

16
Stars
6
Forks
Watchers

Dataset containing Semantic Relations and Metadata, for Training and Evaluating Distributional Semantic Models in English and Mandarin Chinese

textstelle

18
Stars
3
Forks
Watchers

Textstelle is a collection of corpora for the creation of bots and other things that generate text 🤖

wiki-dump-reader

23
Stars
7
Forks
Watchers

Extract corpora from Wikipedia dumps

biomedical_corpora

18
Stars
4
Forks
Watchers

Table compiling the list of biomedically-related corpora available for named entity recognition (and some also suitable for association detection). First version has was published as part of the paper...

lyrics-corpora

18
Stars
1
Forks
Watchers

An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts

data-format

21
Stars
0
Forks
Watchers

The Data Format for Digital Linguistics (DaFoDiL)

CCAE

60
Stars
2
Forks
Watchers

The Official Repository for 👉 CCAE: A Corpus of Chinese-based Asian Englishes @ NLPCC 2023

lm-biomedical-clinical-es

23
Stars
1
Forks
Watchers

Official source for Spanish pretrained biomedical and clinical language models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

huner

47
Stars
11
Forks
Watchers

Named Entity Recognition for biomedical entities