corpora topic
CorpusLoaders.jl
A variety of loaders for various NLP corpora.
EVALution
Dataset containing Semantic Relations and Metadata, for Training and Evaluating Distributional Semantic Models in English and Mandarin Chinese
textstelle
Textstelle is a collection of corpora for the creation of bots and other things that generate text 🤖
wiki-dump-reader
Extract corpora from Wikipedia dumps
biomedical_corpora
Table compiling the list of biomedically-related corpora available for named entity recognition (and some also suitable for association detection). First version has was published as part of the paper...
lyrics-corpora
An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
data-format
The Data Format for Digital Linguistics (DaFoDiL)
CCAE
The Official Repository for 👉 CCAE: A Corpus of Chinese-based Asian Englishes @ NLPCC 2023
lm-biomedical-clinical-es
Official source for Spanish pretrained biomedical and clinical language models and resources made @ BSC-TEMU within the "Plan de las TecnologÃas del Lenguaje" (Plan-TL).
huner
Named Entity Recognition for biomedical entities