bdwenku-spider
bdwenku-spider copied to clipboard
doc格式获取不到源码中url
# 从源码中批量提取数据url
all_addr = re.findall(r'wkbos\.bdimg\.com.*?json.*?expire.*?\}',source_html)
这行代码中获取不到值。求解
已解决 修改一下代码 # 从源码中批量提取数据url all_addr = re.findall(r'wkbjcloudbos.bdimg.com.?json.?}',source_html)
我的输入之后报这样的是什么意思
请输入资源所在的网址:https://wenku.baidu.com/view/a16dfa456e85ec3a87c24028915f804d2b1687f4.html 您输入的url,有误请重新输入! Traceback (most recent call last): File "BDWK.py", line 235, in main File "BDWK.py", line 17, in init File "BDWK.py", line 30, in get_doc_type_and_title UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 424: illegal multibyte sequence
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "BDWK.py", line 271, in