Adrien Barbaresi

Results 6 repositories owned by Adrien Barbaresi

German-NLP

414
Stars
59
Forks
Watchers

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

htmldate

113
Stars
27
Forks
Watchers

Fast and robust date extraction from web pages, with Python or on the command-line

simplemma

130
Stars
10
Forks
Watchers

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

trafilatura

3.0k
Stars
228
Forks
Watchers

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

courlan

71
Stars
8
Forks
Watchers

Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters

py3langid

32
Stars
7
Forks
Watchers

Faster, modernized fork of the language identification tool langid.py