jparser issues

fix bugs

Python3-compatibility and code cleaning

- make code compatible with Python3 - cleaning and linting - try/catch fix around type error in model.py

在Python3中计算每种tag字数的时候，会发生cyfunction不能和str比较

需要加一个判断在model.py文件中需要加入： if not isinstance(t, str): continue ` import re import lxml import lxml.html import urllib.parse from .tags_util import clean_tags_only, clean_tags_hasprop, clean_tags_exactly, clean_tags from .region import Region class PageModel(object): def...

jipiyan

网页正文去除噪声数据

12

你好，我最近也在做相关工作，一般的网页正文都是有很多多余的噪声数据，需要去除，这块有考虑后面加吗

xxllp

库不支持python3

1

model.py 修改为 ` #!/bin/env python #encoding=utf-8 import re import lxml import lxml.html import urllib from .tags_util import clean_tags_only, clean_tags_hasprop, clean_tags_exactly, clean_tags from .region import Region class PageModel(object): def __init__(self, page,...

hurricanetx

jparser
jparser copied to clipboard

Metadata

fix bugs

Python3-compatibility and code cleaning

在Python3中计算每种tag字数的时候，会发生cyfunction不能和str比较

网页正文去除噪声数据

库不支持python3

← Metadata

Owner

Metadata

jparser jparser copied to clipboard

Metadata

fix bugs

Python3-compatibility and code cleaning

在Python3中计算每种tag字数的时候，会发生cyfunction不能和str比较

网页正文去除噪声数据

库不支持python3

← Metadata

Owner

Metadata

jparser
jparser copied to clipboard