sixgod icon indicating copy to clipboard operation
sixgod copied to clipboard

正文提取|extract content from html

sixgod pyton 网页正文提取

思想

  • 优势: 线性时间、不建DOM树、与HTML标签无关
from vampire.htmlextract import HtmlExtract
import requests
html = requests.get('http://www.fabao365.com/fangchan/167193/')
html.encoding="utf-8"
ex = HtmlExtract()
print ex.get_text(html.text)