corpus-builder topic

List corpus-builder repositories

trafilatura

3.0k
Stars
228
Forks
Watchers

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

librivox-tools

20
Stars
2
Forks
Watchers

Collector and speech cutter for librivox audiobooks

Praaline

27
Stars
5
Forks
Watchers

Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora