corpus-builder topic
List
corpus-builder repositories
trafilatura
3.0k
Stars
228
Forks
Watchers
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
corpuscrawler
181
Stars
56
Forks
Watchers
Crawler for linguistic corpora
librivox-tools
20
Stars
2
Forks
Watchers
Collector and speech cutter for librivox audiobooks
Praaline
27
Stars
5
Forks
Watchers
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora