jparser icon indicating copy to clipboard operation
jparser copied to clipboard

A readability parser which can extract title, content, images from html pages

Results 5 jparser issues
Sort by recently updated
recently updated
newest added

- make code compatible with Python3 - cleaning and linting - try/catch fix around type error in model.py

需要加一个判断 在model.py文件中需要加入: if not isinstance(t, str): continue ` import re import lxml import lxml.html import urllib.parse from .tags_util import clean_tags_only, clean_tags_hasprop, clean_tags_exactly, clean_tags from .region import Region class PageModel(object): def...

你好,我最近也在做相关工作,一般的网页正文都是有很多多余的噪声数据,需要去除,这块有考虑后面加吗

model.py 修改为 ` #!/bin/env python #encoding=utf-8 import re import lxml import lxml.html import urllib from .tags_util import clean_tags_only, clean_tags_hasprop, clean_tags_exactly, clean_tags from .region import Region class PageModel(object): def __init__(self, page,...