corpora topic

List corpora repositories

Open-korean-corpora

133
Stars
9
Forks
Watchers

Open Korean NLP Dataset Curation for the Users All Around the Globe

CrossNER

116
Stars
25
Forks
Watchers

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

awesome-cantonese-nlp

82
Stars
4
Forks
Watchers

A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP

corporaexplorer

63
Stars
4
Forks
Watchers

An R package for dynamic exploration of text collections

kontext

59
Stars
22
Forks
Watchers

An advanced, extensible web front-end for the Manatee-open corpus search engine

Arabic-News-Article-Classification

89
Stars
24
Forks
Watchers

Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.

spanish-corpora

87
Stars
10
Forks
Watchers

Unannotated Spanish 3 Billion Words Corpora

OPIEC

36
Stars
6
Forks
Watchers

Reading the data from OPIEC - an Open Information Extraction corpus

PotTS

17
Stars
4
Forks
Watchers

The Potsdam Twitter Sentiment Corpus

parallel-corpora-tools

40
Stars
16
Forks
Watchers

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.