MediaCrawler icon indicating copy to clipboard operation
MediaCrawler copied to clipboard

爬取抖音视频报错

Open keeper-jie opened this issue 1 year ago • 2 comments

参照readme爬取小红书的两个命令都ok,爬取抖音命令扫码后报错:

(d2l) D:\file\file\Win10Share\md\python\code\MediaCrawler>python main.py
2024-04-16  17:37:09 MediaCrawler INFO [DouYinLogin.login_by_qrcode] Begin login douyin by qrcode...
2024-04-16  17:37:17 MediaCrawler INFO [DouYinLogin.begin] login finished then check login state ...
2024-04-16  17:37:37 MediaCrawler INFO [DouYinLogin.begin] login failed please confirm ...

base_config.py 配置文件:

PLATFORM = "dy"
KEYWORDS = "摔倒"
LOGIN_TYPE = "qrcode"  # qrcode or phone or cookie
COOKIES = ""
SORT_TYPE = "popularity_descending"  # 具体值参见media_platform.xxx.field下的枚举值,展示只支持小红书
CRAWLER_TYPE = "search"  # 爬取类型,search(关键词搜索) | detail(帖子详情)| creator(创作者主页数据)

keeper-jie avatar Apr 16 '24 09:04 keeper-jie

修改为cookie登录也报错了

(d2l) D:\file\file\Win10Share\md\python\code\MediaCrawler>python main.py
2024-04-16  18:05:24 MediaCrawler INFO [DouYinLogin.login_by_cookies] Begin login douyin by cookie ...
2024-04-16  18:05:30 MediaCrawler INFO [DouYinLogin.begin] login finished then check login state ...
2024-04-16  18:05:30 MediaCrawler INFO [DouYinLogin.begin] Login successful then wait for 5 seconds redirect ...
2024-04-16  18:05:35 MediaCrawler INFO [DouYinCrawler.search] Begin search douyin keywords
2024-04-16  18:05:35 MediaCrawler INFO [DouYinCrawler.search] Current keyword: 摔倒
2024-04-16  18:05:35 MediaCrawler INFO [DouYinCrawler.search] Skip 0
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "D:\anaconda\envs\d2l\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "main.py", line 59, in main
    await crawler.start()
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\core.py", line 77, in start
    await self.search()
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\core.py", line 100, in search
    posts_res = await self.dy_client.search_info_by_keyword(keyword=keyword,
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\client.py", line 132, in search_info_by_keyword
    return await self.get("/aweme/v1/web/general/search/single/", params, headers=headers)
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\client.py", line 80, in get
    await self.__process_req_params(params, headers)
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\client.py", line 64, in __process_req_params
    x_bogus = douyin_js_obj.call('sign', query, headers["User-Agent"])
  File "D:\anaconda\envs\d2l\lib\site-packages\execjs\_abstract_runtime_context.py", line 37, in call
    return self._call(name, *args)
  File "D:\anaconda\envs\d2l\lib\site-packages\execjs\_external_runtime.py", line 92, in _call
    return self._eval("{identifier}.apply(this, {args})".format(identifier=identifier, args=args))
  File "D:\anaconda\envs\d2l\lib\site-packages\execjs\_external_runtime.py", line 78, in _eval
    return self.exec_(code)
  File "D:\anaconda\envs\d2l\lib\site-packages\execjs\_abstract_runtime_context.py", line 18, in exec_
    return self._exec_(source)
  File "D:\anaconda\envs\d2l\lib\site-packages\execjs\_external_runtime.py", line 88, in _exec_
    return self._extract_result(output)
  File "D:\anaconda\envs\d2l\lib\site-packages\execjs\_external_runtime.py", line 167, in _extract_result
    raise ProgramError(value)
execjs._exceptions.ProgramError: TypeError: Cannot read property 'JS_MD5_NO_COMMON_JS' of null

keeper-jie avatar Apr 16 '24 10:04 keeper-jie

请查看常见问题,安装指定版本的nodejs

NanmiCoder avatar Apr 16 '24 13:04 NanmiCoder

解决了,谢谢,这个可以更新到常见问题里面,我给你提PR

keeper-jie avatar Apr 17 '24 02:04 keeper-jie

你好,出现这个报错是不是账号被抖音封禁了,如果是的可否配置参数进行sleep操作减少访问频次?

(tiktokdownload) D:\file\file\Win10Share\md\python\code\MediaCrawler>python main.py
2024-04-17  13:44:36 MediaCrawler INFO [DouYinLogin.login_by_cookies] Begin login douyin by cookie ...
2024-04-17  13:44:43 MediaCrawler INFO [DouYinLogin.begin] login finished then check login state ...
2024-04-17  13:44:43 MediaCrawler INFO [DouYinLogin.begin] Login successful then wait for 5 seconds redirect ...
2024-04-17  13:44:48 MediaCrawler INFO [DouYinCrawler.search] Begin search douyin keywords
2024-04-17  13:44:48 MediaCrawler INFO [DouYinCrawler.search] Current keyword: 监控下摔倒
2024-04-17  13:44:48 MediaCrawler INFO [DouYinCrawler.search] Skip 0
Traceback (most recent call last):
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\main.py", line 68, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "D:\anaconda\envs\tiktokdownload\Lib\asyncio\base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\main.py", line 59, in main
    await crawler.start()
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\core.py", line 77, in start
    await self.search()
  File "D:\file\file\Win10Share\md\python\code\MediaCrawler\media_platform\douyin\core.py", line 108, in search
    for post_item in posts_res.get("data"):
TypeError: 'NoneType' object is not iterable

keeper-jie avatar Apr 17 '24 05:04 keeper-jie