webcrawler issues

为什么会存在解码问题？

1

Traceback (most recent call last): File "F:/PY/20171006webdriber.py", line 88, in main() File "F:/PY/20171006webdriber.py", line 52, in main girls.write(result_bf) UnicodeEncodeError: 'gbk' codec can't encode character '\u2207' in position 86907: illegal multibyte...

xshkdjsh

'chromedriver' executable needs to be in PATH

2

I am not familiar with bs4. What is this case about? Or let me ask, what is 'chromedrive'? ```bash Traceback (most recent call last): File "girls_crawler_py27.py", line 87, in main()...

HisenZhang

适用于py3.7的部分细节修改

1.知乎的页面改版，已经没有浏览更多，而是往下拖会动态更新出现，因此把execute_times()函数里点击更多那一步去掉 2.写文件的时候，若不加上encoding='utf-8',会报错 3.对于py3.7, 获取node内部内容时，若采用noscript_inner = noscript.get_text()，会提取字符串为空，可以直接 noscript_inner = str(noscript)来转换成对应字符串

fff2zrx

代码更新

1

我发现知乎的html好像更新过了。原来的查看更多回答变成了查看全部回答。而且最上面和最下面都有这个选项。所以您的这个代码是不是要修改更新一下了？（PS:我是windows7系统下的。）代码虽然跑出来了，图片也能下载下来。但是好像有点小问题想在问一下您。 def wait_time(times): for i in range(times): driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(2) try: driver.find_element_by_css_selector('button.QuestionMainAction').click() print("page" + str(i)) time.sleep(1) except: break wait_time(5) 我对此进行了修改： time.sleep(2) try: driver.find_element_by_css_selector('.QuestionMainAction').click() time.sleep(1) print('成功') except: print('失败') 因为只需要点击一次...

cherryxyz

webcrawler
webcrawler copied to clipboard

Metadata

为什么会存在解码问题？

'chromedriver' executable needs to be in PATH

适用于py3.7的部分细节修改

代码更新

← Metadata

Owner

Metadata

webcrawler webcrawler copied to clipboard

Metadata

为什么会存在解码问题？

'chromedriver' executable needs to be in PATH

适用于py3.7的部分细节修改

代码更新

← Metadata

Owner

Metadata

webcrawler
webcrawler copied to clipboard