renrenBackup icon indicating copy to clipboard operation
renrenBackup copied to clipboard

json.decoder.JSONDecodeError

Open Kinggerm opened this issue 1 year ago • 5 comments

Describe the bug

我用以下命令去备份我的个人内容

python manage.py fetch -p *** -e *** -s -g -a -b

前面的下载基本上都正常,但是进行到如下状态后,就报错了

    fetch album 311698393 2008.18 (), 评0/分0/赞0
Traceback (most recent call last):
  File "manage.py", line 158, in <module>
    cli()
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "manage.py", line 53, in fetch
    fetched = fetch_user(
  File "/Users/Kinggerm/Downloads/renrenBackup/fetch.py", line 99, in fetch_user
    fetch_album(uid)
  File "/Users/Kinggerm/Downloads/renrenBackup/fetch.py", line 71, in fetch_album
    album_count = crawl_album.get_albums(uid)
  File "/Users/Kinggerm/Downloads/renrenBackup/crawl/album.py", line 163, in get_albums
    count, after = get_album_list_page(uid, after)
  File "/Users/Kinggerm/Downloads/renrenBackup/crawl/album.py", line 153, in get_album_list_page
    get_album_summary(aid, uid)
  File "/Users/Kinggerm/Downloads/renrenBackup/crawl/album.py", line 73, in get_album_summary
    album_data = crawler.get_json(
  File "/Users/Kinggerm/Downloads/renrenBackup/crawl/crawler.py", line 178, in get_json
    r = json.loads(resp.text.replace(",}", "}"))
  File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

To Reproduce 重新运行原命令可在原位复现,但是账号和密码不太方便透露

Kinggerm avatar Aug 09 '22 15:08 Kinggerm

@Kinggerm 目前人人没法方便查看其他人的相册,不太好确切定位问题

盲猜一是相册名里有一些特殊字符,解析出错,二是之前人人返回的 json 并非标准格式,所以做了替换处理,而原相册名(或相册描述)里刚好有命中了这个坑

  File "/Users/Kinggerm/Downloads/renrenBackup/crawl/crawler.py", line 178, in get_json
    r = json.loads(resp.text.replace(",}", "}"))

建议可以对 2008.18 这个相册的相册名和描述做一些改动,去掉特殊字符,和标点符号,应该是可以继续跑下去的

whusnoopy avatar Aug 09 '22 23:08 whusnoopy

现在人人好像没法做任何改动了?我从web端登录,只能浏览之前的内容;手机端我搜不到APP了

Kinggerm avatar Aug 10 '22 18:08 Kinggerm

另外,非常感谢这个Repo的开发者们,到现在还在为了像我这样的迟到备份者而努力,感慨人人之余很是感动!

Kinggerm avatar Aug 10 '22 18:08 Kinggerm

@Kinggerm 按 #65 里提及,如果是公开相册,其他人还能抓,你可以提供下你的 uid 让其他人帮检查下,不用你的账号密码,前面你贴出来那个报错信息里只有相册号没有 uid,不太方便重入测试

whusnoopy avatar Aug 10 '22 23:08 whusnoopy

@Kinggerm 我做了如下尝试,还需要你提供更多信息,才能定位问题去修复或绕过

  1. 直接通过 Web 端查看指定相册

访问 http://www.renren.com/album/{album_id} ,用你出错信息里的 311698393 套进去提示没有权限

  1. 改代码直接抓指定相册

使用你出错信息里的 311698393 去抓,返回的 json 是 {"errorCode":2010500,"errorMsg":":( cause: java.lang.NullPointerException","server_time":1660186496528} 所以无法正常解析

你可以自己改下代码,在 crawl/crawler.py 的 174 行后加一句

        logger.info("get json: {r}".format(r=resp.text))

然后再运行,把报错时的上下文给出来

whusnoopy avatar Aug 11 '22 03:08 whusnoopy

鉴于此问题无法由其他人复现去测试修复,且 @Kinggerm 未给出后续信息,本 issue 先关闭,如有更新重新再开启记录

whusnoopy avatar Sep 08 '22 07:09 whusnoopy