corpora topic
Open-korean-corpora
Open Korean NLP Dataset Curation for the Users All Around the Globe
CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
awesome-cantonese-nlp
A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP
corporaexplorer
An R package for dynamic exploration of text collections
kontext
An advanced, extensible web front-end for the Manatee-open corpus search engine
Arabic-News-Article-Classification
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
spanish-corpora
Unannotated Spanish 3 Billion Words Corpora
OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
PotTS
The Potsdam Twitter Sentiment Corpus
parallel-corpora-tools
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.