html-extractor topic

List html-extractor repositories

css-from-html-extractor

5
Stars
1
Forks
Watchers

PHP library which determines which css is used from html snippets.

sumy

3.4k
Stars
523
Forks
Watchers

Module for automatic summarization of text documents and HTML pages.

essence

114
Stars
14
Forks
Watchers

Automatically extract the main text content (and more) from an HTML document

breadability

203
Stars
26
Forks
Watchers

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

html-extractor

50
Stars
11
Forks
Watchers

基于行块分布函数的通用网页正文抽取算法优化,Python实现

textractor

15
Stars
4
Forks
Watchers

从html中提取正文,用于新闻类网页