爬取知乎,出现playwright.impl.error
⚠️ 提交前确认
- [ ✔] 我已经仔细阅读了项目使用过程中的常见问题汇总
- [✔ ] 我已经搜索并查看了已关闭的issues
- [ ✔] 我确认这不是由于滑块验证码、Cookie过期、Cookie提取错误、平台风控等常见原因导致的问题
❓ 问题描述
爬取知乎时,扫码登陆后返回报错playwright.impl.error,已确定不是梯子问题
🔍 使用场景
- 目标平台: 知乎
- 使用功能: 根据关键词爬取帖子
💻 环境信息
- 操作系统: win10
- Python版本: 3.9.21
- 是否使用IP代理: 无
- 是否使用VPN翻墙软件:无
- 目标平台(抖音/小红书/微博等): 知乎
📋 错误日志
(crawlEnv) PS E:\GraProgram\Crawler\MediaCrawler> python main.py
2025-02-21 23:39:32 MediaCrawler INFO (core.py:345) - [ZhihuCrawler.launch_browser] Begin create browser context ...
2025-02-21 23:39:34 MediaCrawler INFO (core.py:316) - [ZhihuCrawler.create_zhihu_client] Begin create zhihu API client ...
2025-02-21 23:39:34 MediaCrawler INFO (client.py:137) - [ZhiHuClient.pong] Begin to pong zhihu...
2025-02-21 23:39:34 MediaCrawler ERROR (client.py:147) - [ZhiHuClient.pong] Ping zhihu failed: SyntaxError: 语法错误, and try to login again...
2025-02-21 23:39:34 MediaCrawler INFO (login.py:58) - [ZhiHu.begin] Begin login zhihu ...
2025-02-21 23:39:34 MediaCrawler INFO (login.py:74) - [ZhiHu.login_by_qrcode] Begin login zhihu by qrcode ...
Traceback (most recent call last):
File "E:\GraProgram\Crawler\MediaCrawler\main.py", line 66, in <module>
asyncio.get_event_loop().run_until_complete(main())
File "E:\anaconda3\envs\crawlEnv\lib\asyncio\base_events.py", line 647, in run_until_complete
return future.result()
File "E:\GraProgram\Crawler\MediaCrawler\main.py", line 56, in main
await crawler.start()
File "E:\GraProgram\Crawler\MediaCrawler\media_platform\zhihu\core.py", line 85, in start
await login_obj.begin()
File "E:\GraProgram\Crawler\MediaCrawler\media_platform\zhihu\login.py", line 60, in begin
await self.login_by_qrcode()
File "E:\GraProgram\Crawler\MediaCrawler\media_platform\zhihu\login.py", line 77, in login_by_qrcode
base64_qrcode_img = await utils.find_qrcode_img_from_canvas(
File "E:\GraProgram\Crawler\MediaCrawler\tools\crawler_util.py", line 68, in find_qrcode_img_from_canvas
canvas = await page.wait_for_selector(canvas_selector)
File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\async_api\_generated.py", line 7786, in wait_for_selector
await self._impl_obj.wait_for_selector(
File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_page.py", line 373, in wait_for_selector
return await self._main_frame.wait_for_selector(**locals_to_params(locals()))
File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_frame.py", line 323, in wait_for_selector
await self._channel.send("waitForSelector", locals_to_params(locals()))
File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_connection.py", line 59, in send
return await self._connection.wrap_api_call(
File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_connection.py", line 509, in wrap_api_call
return await cb()
File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_connection.py", line 97, in inner_send
result = next(iter(done)).result()
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.
📷 错误截图
删除browser_data目录再次尝试。
node版本是22.13.0 删除browser_data目录后,终端打印新的问题: `Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\anyio\streams\tls.py", line 140, in _call_sslobject_method result = func(*args) File "D:\anaconda\envs\crawl\lib\ssl.py", line 944, in do_handshake self._sslobj.do_handshake() ssl.SSLSyscallError: Some I/O error occurred (_ssl.c:1129)
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_exceptions.py", line 10, in map_exceptions yield File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_backends\anyio.py", line 78, in start_tls raise exc File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_backends\anyio.py", line 69, in start_tls ssl_stream = await anyio.streams.tls.TLSStream.wrap( File "D:\anaconda\envs\crawl\lib\site-packages\anyio\streams\tls.py", line 132, in wrap await wrapper._call_sslobject_method(ssl_object.do_handshake) File "D:\anaconda\envs\crawl\lib\site-packages\anyio\streams\tls.py", line 161, in _call_sslobject_method raise BrokenResourceError from exc anyio.BrokenResourceError
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 60, in map_httpcore_exceptions yield File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection_pool.py", line 262, in handle_async_request raise exc File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection_pool.py", line 245, in handle_async_request response = await connection.handle_async_request(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\http_proxy.py", line 271, in handle_async_request connect_response = await self._connection.handle_async_request( File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection.py", line 92, in handle_async_request raise exc File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection.py", line 69, in handle_async_request stream = await self._connect(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection.py", line 149, in _connect stream = await stream.start_tls(**kwargs) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_backends\anyio.py", line 78, in start_tls raise exc File "D:\anaconda\envs\crawl\lib\contextlib.py", line 135, in exit self.gen.throw(type, value, traceback) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ConnectError
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\tenacity_asyncio.py", line 50, in call result = await fn(*args, **kwargs) File "D:\cxy\GraCrawl\MediaCrawler-main\media_platform\zhihu\client.py", line 83, in request response = await client.request( File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1530, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1617, in send response = await self._send_handling_auth( File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1645, in _send_handling_auth response = await self._send_handling_redirects( File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1682, in _send_handling_redirects response = await self._send_single_request(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1719, in _send_single_request response = await transport.handle_async_request(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) File "D:\anaconda\envs\crawl\lib\contextlib.py", line 135, in exit self.gen.throw(type, value, traceback) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 77, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ConnectError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\cxy\GraCrawl\MediaCrawler-main\main.py", line 66, in
排除下是否开启了VPN