MediaCrawler icon indicating copy to clipboard operation
MediaCrawler copied to clipboard

python main.py --platform dy --lt qrcode --type detail报错Get aweme detail error: Expecting value: line 1 column 1 (char 0),

Open PhantomTide opened this issue 1 year ago • 4 comments

2024-04-14 23:45:21 MediaCrawler ERROR [DouYinCrawler.get_aweme_detail] Get aweme detail error: Expecting value: line 1 column 1 (char 0), 2024-04-14 23:45:21 MediaCrawler ERROR [DouYinCrawler.get_aweme_detail] Get aweme detail error: Expecting value: line 1 column 1 (char 0),

抖音使用二维码登录,数据保存为json,测试的时候报错:不知道是什么原因?

执行的命令:python main.py --platform dy --lt qrcode --type detail 定位的代码: async def get_aweme_detail(self, aweme_id: str, semaphore: asyncio.Semaphore) -> Any: """Get note detail""" async with semaphore: try: print("aweme_id:", aweme_id) return await self.dy_client.get_video_by_id(aweme_id) except DataFetchError as ex: utils.logger.error(f"[DouYinCrawler.get_aweme_detail] Get aweme detail error: {ex}") return None except KeyError as ex: utils.logger.error( f"[DouYinCrawler.get_aweme_detail] have not fund note detail aweme_id:{aweme_id}, err: {ex}") return None

print("aweme_id:", aweme_id) 这行代码可以正常执行,aweme_id: 7280854932641664319。

PhantomTide avatar Apr 14 '24 15:04 PhantomTide

打了断点发现异常在: async def get_video_by_id(self, aweme_id: str) -> Any: """ DouYin Video Detail API :param aweme_id: :return: """ params = { "aweme_id": aweme_id } headers = copy.copy(self.headers) # headers["Cookie"] = "s_v_web_id=verify_lol4a8dv_wpQ1QMyP_xemd_4wON_8Yzr_FJa8DN1vdY2m;" del headers["Origin"] res = await self.get("/aweme/v1/web/aweme/detail/", params, headers) return res.get("aweme_detail", {})

这行代码报错: res = await self.get("/aweme/v1/web/aweme/detail/", params, headers)

PhantomTide avatar Apr 14 '24 15:04 PhantomTide

继续定位到: async def request(self, method, url, **kwargs): async with httpx.AsyncClient(proxies=self.proxies) as client: response = await client.request( method, url, timeout=self.timeout, **kwargs ) try: return response.json() except Exception as e: raise DataFetchError(f"{e}, {response.text}")

发现: response.status_code: 200 response.content: b''

这个是什么原因?

PhantomTide avatar Apr 14 '24 15:04 PhantomTide

根据关键词搜索可以爬取到数据,但是根据ID报错了。 python main.py --platform dy --lt qrcode --type search

2024-04-15 00:03:03 MediaCrawler INFO [DouYinCrawler.search] Begin search douyin keywords 2024-04-15 00:03:03 MediaCrawler INFO [DouYinCrawler.search] Current keyword: python 2024-04-15 00:03:03 MediaCrawler INFO [DouYinCrawler.search] Skip 0 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7086069289383447838, title:如何高效的学习#python #程序开发 #编程开发?如何快速掌握python开发技能?#程序员 #编程 #编程入门 #知识分享 #创作灵感 #dou十小助手 #python学习 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7329148648820870434, title:搞爬虫事真能进去呀#编程 #python #高薪 #爬虫 #计算机 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7163907009899728165, title:#干货分享 #大一新生必看 #大学生 所有和你说Python好学的人都是在骗你!学长给你推荐Python自学路径,码住! 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7018095395666447649, title:#python 还能转换为#C语言 ?还能隐藏源码?运行速度还起飞?#编程 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7107160174455999759, title:#python #编程 #涨知识 十行代码,让小白也能玩转python 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7297938957189336330, title:大型纪录片-python公务员之路 #程序员 #计算机#编程 @DOU+小助手 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7308574705349315866, title:就书和视频就行,硬学三个月,甚至都用不到三个月,别花那么多报课#经验分享 #零基础教学 #python编程 #数据分析#超实用学习方法 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7283082227653643554, title:一小时学会python编程?第七课 #python #python入门 #编程 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7069621798928764174, title:Python爬虫脚本的三种技术,你会几种#python #Python爬虫#Python脚本 2024-04-15 00:03:04 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7127225115523288351, title:入门数据分析最危险的大坑!你踩过吗?#数据分析 #Python #SQL #Tableau #知识分享 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7357741953158630207, title: 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7276372612945693952, title:研究生掌握一些Python的基础操作真的很方便~ #Python #研究生日常 #科研狗的日常 #读博的日子 #干货分享 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7310904386539097354, title:理解这四行编程代码,让你的逻辑思维再上一层楼!#程序员 #干货分享 #编程 #思维 #每天学习一点点 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7293450413074959625, title:爬虫为什么都用Python做?爬虫只能用Python做吗?#爬虫 #python #程序员 #编程 #互联网 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7307978643190320421, title:小伙自学Python几个月就能赚100多万,你知道这有多吓人吗?#编程 #python编程 #干货分享 #爬虫 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7224051886876298557, title:为什么python可以学但不能乱用!#python #编程 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7048767187770756352, title:今年python还值得学吗? 2024-04-15 00:03:06 MediaCrawler INFO [store.douyin.update_douyin_aweme] douyin aweme id:7094568106994978085, title: #python 是用#c语言 写的,为什么说速度比Python慢那么多?#程序员 #编程 #计算机 #it 2024-04-15 00:03:06 MediaCrawler INFO [DouYinCrawler.search] keyword:python, aweme_list:['7086069289383447838', '7329148648820870434', '7163907009899728165', '7018095395666447649', '7107160174455999759', '7297938957189336330', '7308574705349315866', '7283082227653643554', '7069621798928764174', '7127225115523288351', '7357741953158630207', '7276372612945693952', '7310904386539097354', '7293450413074959625', '7307978643190320421', '7224051886876298557', '7048767187770756352', '7094568106994978085']

PhantomTide avatar Apr 14 '24 16:04 PhantomTide

很奇怪,自己又测试了几下又好了。

PhantomTide avatar Apr 14 '24 16:04 PhantomTide

这种情况一般是账号被block了,过一段时间就OK了

NanmiCoder avatar Apr 15 '24 13:04 NanmiCoder