corpus-tools topic
audiomate
Python library for handling audio datasets.
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
bitextor
Bitextor generates translation memories from multilingual websites
ua-gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
OpusFilter
OpusFilter - Parallel corpus processing toolkit
beta
An open source reimplementation of Benny Brodda's BETA in Python
kontext
An advanced, extensible web front-end for the Manatee-open corpus search engine
OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus