cix-extractor-py icon indicating copy to clipboard operation
cix-extractor-py copied to clipboard

结果有乱码

Open xxllp opened this issue 9 years ago • 5 comments

如题,换了个网页直接打印结果乱码

xxllp avatar Sep 27 '16 07:09 xxllp

@xxllp 网址?

rainyear avatar Sep 29 '16 07:09 rainyear

ext = Extractor(url="http://www.ahgd.gov.cn/web_content.php?id=14971",blockSize=5, image=False) print(ext.getContext())

xxllp avatar Oct 08 '16 08:10 xxllp

确实有乱码,我改用了BeautifulSoup+html5lib 解析网页

klzsysy avatar Oct 18 '16 10:10 klzsysy

其实我在想为什么输出结果不仅没换行,连空格都没有

yingshaoxo avatar Jan 18 '17 03:01 yingshaoxo

@klzsysy resp.encoding 指定为网页的encoding ,默认是UTF-8输出的,如果你的页面不是UTF-8肯定乱码了。

ljhzds avatar Apr 14 '17 07:04 ljhzds