MediaCrawler icon indicating copy to clipboard operation
MediaCrawler copied to clipboard

爬取知乎,出现playwright.impl.error

Open ChloeC857 opened this issue 10 months ago • 3 comments

⚠️ 提交前确认

  • [ ✔] 我已经仔细阅读了项目使用过程中的常见问题汇总
  • [✔ ] 我已经搜索并查看了已关闭的issues
  • [ ✔] 我确认这不是由于滑块验证码、Cookie过期、Cookie提取错误、平台风控等常见原因导致的问题

❓ 问题描述

爬取知乎时,扫码登陆后返回报错playwright.impl.error,已确定不是梯子问题

🔍 使用场景

  • 目标平台: 知乎
  • 使用功能: 根据关键词爬取帖子

💻 环境信息

  • 操作系统: win10
  • Python版本: 3.9.21
  • 是否使用IP代理: 无
  • 是否使用VPN翻墙软件:无
  • 目标平台(抖音/小红书/微博等): 知乎

📋 错误日志

(crawlEnv) PS E:\GraProgram\Crawler\MediaCrawler> python main.py                                            
2025-02-21 23:39:32 MediaCrawler INFO (core.py:345) - [ZhihuCrawler.launch_browser] Begin create browser context ...
2025-02-21 23:39:34 MediaCrawler INFO (core.py:316) - [ZhihuCrawler.create_zhihu_client] Begin create zhihu API client ...
2025-02-21 23:39:34 MediaCrawler INFO (client.py:137) - [ZhiHuClient.pong] Begin to pong zhihu...
2025-02-21 23:39:34 MediaCrawler ERROR (client.py:147) - [ZhiHuClient.pong] Ping zhihu failed: SyntaxError: 语法错误, and try to login again...
2025-02-21 23:39:34 MediaCrawler INFO (login.py:58) - [ZhiHu.begin] Begin login zhihu ...
2025-02-21 23:39:34 MediaCrawler INFO (login.py:74) - [ZhiHu.login_by_qrcode] Begin login zhihu by qrcode ...
Traceback (most recent call last):
  File "E:\GraProgram\Crawler\MediaCrawler\main.py", line 66, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "E:\anaconda3\envs\crawlEnv\lib\asyncio\base_events.py", line 647, in run_until_complete
    return future.result()
  File "E:\GraProgram\Crawler\MediaCrawler\main.py", line 56, in main
    await crawler.start()
  File "E:\GraProgram\Crawler\MediaCrawler\media_platform\zhihu\core.py", line 85, in start
    await login_obj.begin()
  File "E:\GraProgram\Crawler\MediaCrawler\media_platform\zhihu\login.py", line 60, in begin
    await self.login_by_qrcode()
  File "E:\GraProgram\Crawler\MediaCrawler\media_platform\zhihu\login.py", line 77, in login_by_qrcode
    base64_qrcode_img = await utils.find_qrcode_img_from_canvas(
  File "E:\GraProgram\Crawler\MediaCrawler\tools\crawler_util.py", line 68, in find_qrcode_img_from_canvas
    canvas = await page.wait_for_selector(canvas_selector)
  File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\async_api\_generated.py", line 7786, in wait_for_selector
    await self._impl_obj.wait_for_selector(
  File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_page.py", line 373, in wait_for_selector
    return await self._main_frame.wait_for_selector(**locals_to_params(locals()))
  File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_frame.py", line 323, in wait_for_selector
    await self._channel.send("waitForSelector", locals_to_params(locals()))
  File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_connection.py", line 59, in send
    return await self._connection.wrap_api_call(
  File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_connection.py", line 509, in wrap_api_call
    return await cb()
  File "E:\anaconda3\envs\crawlEnv\lib\site-packages\playwright\_impl\_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.

📷 错误截图

ChloeC857 avatar Feb 21 '25 15:02 ChloeC857

删除browser_data目录再次尝试。

NanmiCoder avatar Feb 24 '25 01:02 NanmiCoder

node版本是22.13.0 删除browser_data目录后,终端打印新的问题: `Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\anyio\streams\tls.py", line 140, in _call_sslobject_method result = func(*args) File "D:\anaconda\envs\crawl\lib\ssl.py", line 944, in do_handshake self._sslobj.do_handshake() ssl.SSLSyscallError: Some I/O error occurred (_ssl.c:1129)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_exceptions.py", line 10, in map_exceptions yield File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_backends\anyio.py", line 78, in start_tls raise exc File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_backends\anyio.py", line 69, in start_tls ssl_stream = await anyio.streams.tls.TLSStream.wrap( File "D:\anaconda\envs\crawl\lib\site-packages\anyio\streams\tls.py", line 132, in wrap await wrapper._call_sslobject_method(ssl_object.do_handshake) File "D:\anaconda\envs\crawl\lib\site-packages\anyio\streams\tls.py", line 161, in _call_sslobject_method raise BrokenResourceError from exc anyio.BrokenResourceError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 60, in map_httpcore_exceptions yield File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection_pool.py", line 262, in handle_async_request raise exc File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection_pool.py", line 245, in handle_async_request response = await connection.handle_async_request(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\http_proxy.py", line 271, in handle_async_request connect_response = await self._connection.handle_async_request( File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection.py", line 92, in handle_async_request raise exc File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection.py", line 69, in handle_async_request stream = await self._connect(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_async\connection.py", line 149, in _connect stream = await stream.start_tls(**kwargs) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_backends\anyio.py", line 78, in start_tls raise exc File "D:\anaconda\envs\crawl\lib\contextlib.py", line 135, in exit self.gen.throw(type, value, traceback) File "D:\anaconda\envs\crawl\lib\site-packages\httpcore_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ConnectError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\anaconda\envs\crawl\lib\site-packages\tenacity_asyncio.py", line 50, in call result = await fn(*args, **kwargs) File "D:\cxy\GraCrawl\MediaCrawler-main\media_platform\zhihu\client.py", line 83, in request response = await client.request( File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1530, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1617, in send response = await self._send_handling_auth( File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1645, in _send_handling_auth response = await self._send_handling_redirects( File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1682, in _send_handling_redirects response = await self._send_single_request(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_client.py", line 1719, in _send_single_request response = await transport.handle_async_request(request) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) File "D:\anaconda\envs\crawl\lib\contextlib.py", line 135, in exit self.gen.throw(type, value, traceback) File "D:\anaconda\envs\crawl\lib\site-packages\httpx_transports\default.py", line 77, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ConnectError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\cxy\GraCrawl\MediaCrawler-main\main.py", line 66, in asyncio.get_event_loop().run_until_complete(main()) File "D:\anaconda\envs\crawl\lib\asyncio\base_events.py", line 642, in run_until_complete return future.result() File "D:\cxy\GraCrawl\MediaCrawler-main\main.py", line 56, in main await crawler.start() File "D:\cxy\GraCrawl\MediaCrawler-main\media_platform\zhihu\core.py", line 97, in start await self.search() File "D:\cxy\GraCrawl\MediaCrawler-main\media_platform\zhihu\core.py", line 128, in search content_list: List[ZhihuContent] = await self.zhihu_client.get_note_by_keyword( File "D:\cxy\GraCrawl\MediaCrawler-main\media_platform\zhihu\client.py", line 212, in get_note_by_keyword search_res = await self.get(uri, params) File "D:\cxy\GraCrawl\MediaCrawler-main\media_platform\zhihu\client.py", line 129, in get return await self.request(method="GET", url=base_url + final_uri, headers=headers, **kwargs) File "D:\anaconda\envs\crawl\lib\site-packages\tenacity_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) File "D:\anaconda\envs\crawl\lib\site-packages\tenacity_asyncio.py", line 47, in call do = self.iter(retry_state=retry_state) File "D:\anaconda\envs\crawl\lib\site-packages\tenacity_init_.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x2c53ec184c0 state=finished raised ConnectError>]`

ChloeC857 avatar Feb 24 '25 06:02 ChloeC857

排除下是否开启了VPN

NanmiCoder avatar Mar 03 '25 06:03 NanmiCoder