html-extraction topic

List html-extraction repositories

sumy

3.4k
Stars
523
Forks
Watchers

Module for automatic summarization of text documents and HTML pages.

breadability

203
Stars
26
Forks
Watchers

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

hext

51
Stars
3
Forks
Watchers

Domain-specific language for extracting structured data from HTML documents