weibo-crawler icon indicating copy to clipboard operation
weibo-crawler copied to clipboard

报错,但能执行。

Open zhaibin opened this issue 2 years ago • 4 comments

Extra data: line 128 column 2 (char 4697) Traceback (most recent call last): File "weibo.py", line 766, in get_one_weibo weibo = self.get_long_weibo(weibo_id) File "weibo.py", line 351, in get_long_weibo js = json.loads(html, strict=False) File "/usr/lib/python3.7/json/init.py", line 361, in loads return cls(**kw).decode(s) File "/usr/lib/python3.7/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 128 column 2 (char 4697)

zhaibin avatar Apr 27 '22 23:04 zhaibin

Extra data: line 128 column 2 (char 4697) Traceback (most recent call last): File "weibo.py", line 766, in get_one_weibo weibo = self.get_long_weibo(weibo_id) File "weibo.py", line 351, in get_long_weibo js = json.loads(html, strict=False) File "/usr/lib/python3.7/json/init.py", line 361, in loads return cls(**kw).decode(s) File "/usr/lib/python3.7/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 128 column 2 (char 4697)

一样的问题,昨天之前都能正常运行,大概晚上就出错了还以为是我改动了哪里

tuling-xiaofeng avatar Apr 28 '22 04:04 tuling-xiaofeng

这个bug的原因是请求到的html不能被parse成单个json object,而json.loads()只能处理单个json object,导致的结果是无法抓取长微博。估计是微博页面的html结构变了。

出错位置在这里: https://github.com/dataabc/weibo-crawler/blob/0fbc03d80f84d3728993d3693c06462d4bf85d8a/weibo.py#L349-L351 修改为: html = html[:html.rfind(',')] html = html[:html.rfind('][')] (增加) html = '{' + html (修改) js = json.loads(html, strict=False)

ffffuturexu avatar Apr 28 '22 07:04 ffffuturexu

这个bug的原因是请求到的html不能被parse成单个json object,而json.loads()只能处理单个json object,导致的结果是无法抓取长微博。估计是微博页面的html结构变了。

出错位置在这里:

https://github.com/dataabc/weibo-crawler/blob/0fbc03d80f84d3728993d3693c06462d4bf85d8a/weibo.py#L349-L351

修改为: html = html[:html.rfind(',')] html = html[:html.rfind('][')] (增加) html = '{' + html (修改) js = json.loads(html, strict=False)

感谢大佬,问题已解决

tuling-xiaofeng avatar Apr 28 '22 07:04 tuling-xiaofeng

长微博 HTML 结构有变,get_long_weibo 方法中的 "hotScheme" 改为 "call" 即可

mobyw avatar Apr 28 '22 09:04 mobyw