MediaCrawler icon indicating copy to clipboard operation
MediaCrawler copied to clipboard

爬小红书出现频繁访问的错误

Open Machoman6 opened this issue 1 year ago • 5 comments

Traceback (most recent call last): File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_asyncio.py", line 50, in call result = await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 99, in request raise DataFetchError(data.get("msg", None)) media_platform.xhs.exception.DataFetchError: 访问频次异常,请勿频繁操作或重启试试

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\clpq\MediaCrawler-main\main.py", line 55, in asyncio.get_event_loop().run_until_complete(main()) File "C:\Users\zxnb\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "D:\clpq\MediaCrawler-main\main.py", line 45, in main await crawler.start() File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 78, in start await self.search() File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 138, in search await self.batch_get_note_comments(note_id_list) File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 252, in batch_get_note_comments await asyncio.gather(*task_list) File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 258, in get_comments await self.xhs_client.get_note_all_comments( File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 288, in get_note_all_comments comments_res = await self.get_note_comments(note_id, comments_cursor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 249, in get_note_comments return await self.get(uri, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 116, in get return await self.request(method="GET", url=f"{self.host}{final_uri}", headers=headers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_asyncio.py", line 47, in call do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_init.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x24f2e98f8c0 state=finished raised DataFetchError>]

Machoman6 avatar Oct 02 '24 05:10 Machoman6

你爬了多少条出现这个错误

luyixiao31 avatar Oct 04 '24 02:10 luyixiao31

我也遇到了 是爬取小红书评论的时候 大概2000多条 这个没办法了,只能说换ip了

xiaou61 avatar Oct 05 '24 13:10 xiaou61

我就爬了三十多条就不行了

xukaizhao avatar Oct 09 '24 07:10 xukaizhao

测试了一下,好像是20个搜索词就会被限制。

97wgl avatar Oct 12 '24 03:10 97wgl

你们遇到的问题解决了吗?

bnuside avatar Sep 19 '25 07:09 bnuside