corpus topic

List corpus repositories

eKeyboard

12
Stars
3
Forks
Watchers

Make typing Amharic [on mobile] great [again].

sejong-corpus

138
Stars
24
Forks
Watchers

Korean sejong corpus download and simple analysis

german-nouns

134
Stars
18
Forks
Watchers

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

trafilatura

3.0k
Stars
228
Forks
Watchers

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

chatterbot-corpus

1.3k
Stars
1.2k
Forks
Watchers

A multilingual dialog corpus

FakeNewsCorpus

377
Stars
96
Forks
Watchers

A dataset of millions of news articles scraped from a curated list of data sources.

Lenta.Ru-News-Dataset

139
Stars
25
Forks
Watchers

Corpus of Russian news articles collected from Lenta.Ru

aspen

30
Stars
5
Forks
Watchers

🔎 📖 ✨ Custom, private search engine for text documents built with NextJS/React/ES6/ES7