you-get
you-get copied to clipboard
过期时间为0的cookie,无法被携带在请求中,导致请求失败。
我参考了发问题的要求,但我这个似乎不能按照那个格式来。这是我第一次着手参与到开源项目里,下面是关于cookie的问题和测试过程的详细描述。(我看有很多中文的提交,我就偷了写英文的懒)
问题
过期时间为0的cookie无法被携带在请求中。
我想添加对智慧树(一个学习网站)的支持,它所使用的cookie的过期时间为0,如果我们不手动更改导出cookie的过期时间,我们将无法请求到相关的资源。
但我认为不应该让用户去编辑cookie,所以我做了一些测试,试图解决它,但我遇到一些困难。我一方面不想更改cookie的信息(比如把过期时间设置成今后的某个时间),我觉得我们应该保持“我还在浏览器里”的那种状态,另一方面过期时间为0的cookie就算被生成,在发送请求的时候也不会被携带。
测试过程
我刚写好了智慧树的脚本,开始对脚本开始测试,发现请求uuid失败,也就是说有些cookie没有按预期加载,但我确实加载了cookie文件
./you-get https://studyh5.zhihuishu.com/videoStudy.html\#/studyVideo\?recruitAndCourseId\=4e505b5f45524258454a585858435d46 -l -c zhihuishu.com_cookies.txt --debug
[DEBUG] get_content: https://studyservice.zhihuishu.com/login/getLoginUserInfo?dateFormate=1620567306
you-get: version 0.4.1524, a tiny downloader that scrapes the web.
you-get: Namespace(version=False, help=False, info=False, url=False, json=False, no_merge=False, no_caption=False, force=False, skip_existing_file_size_check=False, format=None, output_filename=None, output_dir='.', player=None, cookies='zhihuishu.com_cookies.txt', timeout=600, debug=True, input_file=None, password=None, playlist=True, first=None, last=None, size=None, auto_rename=False, insecure=False, http_proxy=None, extractor_proxy=None, no_proxy=False, socks_proxy=None, stream=None, itag=None, URL=['https://studyh5.zhihuishu.com/videoStudy.html#/studyVideo?recruitAndCourseId=4e505b5f45524258454a585858435d46'])
Traceback (most recent call last):
File "/home/zy/githubProject/you-get/./you-get", line 11, in <module>
you_get.main(repo_path=_filepath)
File "/home/zy/githubProject/you-get/src/you_get/__main__.py", line 92, in main
main(**kwargs)
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1831, in main
script_main(any_download, any_download_playlist, **kwargs)
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1712, in script_main
download_main(
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1337, in download_main
download_playlist(url, **kwargs)
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1827, in any_download_playlist
m.download_playlist(url, **kwargs)
File "/home/zy/githubProject/you-get/src/you_get/extractors/zhihuishu.py", line 78, in download_playlist
uuid = self.get_uuid()
File "/home/zy/githubProject/you-get/src/you_get/extractors/zhihuishu.py", line 25, in get_uuid
raise KeyError('Can not request uuid Please check you cookies')
KeyError: 'Can not request uuid Please check you cookies'
然后找了找代码,发现在you-get的common模块中这段代码,把这样过期时间为0的cookie给过滤掉了
然后我把ignore_expires设为True,增加了一条日记的打印,继续测试,发现还是出现请求没有权限的问题,但这次cookie没有continue过去,所以应该是加载了。
[DEBUG] get_content: https://studyservice.zhihuishu.com/login/getLoginUserInfo?dateFormate=1620565443
you-get: version 0.4.1524, a tiny downloader that scrapes the web.
you-get: Namespace(version=False, help=False, info=False, url=False, json=False, no_merge=False, no_caption=False, force=False, skip_existing_file_size_check=False, format=None, output_filename=None, output_dir='.', player=None, cookies='zhihuishu.com_cookies.txt', timeout=600, debug=True, input_file=None, password=None, playlist=True, first=None, last=None, size=None, auto_rename=False, insecure=False, http_proxy=None, extractor_proxy=None, no_proxy=False, socks_proxy=None, stream=None, itag=None, URL=['https://studyh5.zhihuishu.com/videoStudy.html#/studyVideo?recruitAndCourseId=4e505b5f45524258454a585858435d46'])
Traceback (most recent call last):
File "/home/zy/githubProject/you-get/./you-get", line 11, in <module>
you_get.main(repo_path=_filepath)
File "/home/zy/githubProject/you-get/src/you_get/__main__.py", line 92, in main
main(**kwargs)
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1831, in main
script_main(any_download, any_download_playlist, **kwargs)
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1712, in script_main
download_main(
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1337, in download_main
download_playlist(url, **kwargs)
File "/home/zy/githubProject/you-get/src/you_get/common.py", line 1827, in any_download_playlist
m.download_playlist(url, **kwargs)
File "/home/zy/githubProject/you-get/src/you_get/extractors/zhihuishu.py", line 78, in download_playlist
uuid = self.get_uuid()
File "/home/zy/githubProject/you-get/src/you_get/extractors/zhihuishu.py", line 25, in get_uuid
raise KeyError('Can not request uuid Please check you cookies')
KeyError: 'Can not request uuid Please check you cookies'
后面我还是觉得是expiry的问题,于是我暴力的把所有cookie的过期时间都改成了未来的时间,再进行测试,发现按预期正确的请求和解析相关数据和地址了。
./you-get https://studyh5.zhihuishu.com/videoStudy.html\#/studyVideo\?recruitAndCourseId\=4e505b5f45524258454a585858435d46 -l -c zhihuishu.com_cookies.txt --debug
[DEBUG] get_content: https://studyservice.zhihuishu.com/login/getLoginUserInfo?dateFormate=1620568489
[DEBUG] post_content: https://studyservice.zhihuishu.com/learning/videolist
post_data: {'uuid': '隐藏处理', 'recruitAndCourseId': '4e505b5f45524258454a585858435d46', 'dateFormate': '1620568489'}
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12181395&_=1620568489
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12181845&_=1620568490
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12182101&_=1620568490
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12182227&_=1620568490
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12182055&_=1620568490
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12182189&_=1620568491
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12182277&_=1620568491
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12183547&_=1620568491
[DEBUG] get_content: https://newbase.zhihuishu.com/video/initVideo?jsonpCallBack=result&videoID=12221821&_=1620568491
最后
因为这些cookie的实际可使用的时间就不长,加上cookie的特殊性质我不能一同附上来,所以如果需要配合复现问题,可以联系我,我将会积极配合。
提交的代码包括对智慧树网站的一些支持,以及修改了common里的代码,把所有的cookie的过期都设置在了未来(但我没有对其他网站进行测试,所有我不知道这样是否会导致一些别的问题)
Hello @Zy3122971650, Thanks for the Pull Request. We :heart: our contributors! Please wait for one of our human maintainers to review your patches. This may take a few days to weeks. Also, please understand that although your Pull Request may or may not be eventually merged, we value all contributions equally.
祝您健康!