corpus-tools topic

List corpus-tools repositories

audiomate

130
Stars
26
Forks
Watchers

Python library for handling audio datasets.

simplemma

130
Stars
10
Forks
Watchers

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

trafilatura

3.0k
Stars
228
Forks
Watchers

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Wordless

673
Stars
88
Forks
Watchers

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

bitextor

287
Stars
43
Forks
Watchers

Bitextor generates translation memories from multilingual websites

ua-gec

255
Stars
21
Forks
Watchers

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

beta

63
Stars
2
Forks
Watchers

An open source reimplementation of Benny Brodda's BETA in Python

kontext

59
Stars
22
Forks
Watchers

An advanced, extensible web front-end for the Manatee-open corpus search engine

OPIEC

36
Stars
6
Forks
Watchers

Reading the data from OPIEC - an Open Information Extraction corpus