html-extraction topic
List
html-extraction repositories
sumy
3.4k
Stars
523
Forks
Watchers
Module for automatic summarization of text documents and HTML pages.
breadability
203
Stars
26
Forks
Watchers
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
hext
51
Stars
3
Forks
Watchers
Domain-specific language for extracting structured data from HTML documents