Abustler
Abustler
在下载[案例分析--林肯电气公司的激励制度](https://wenku.baidu.com/view/967dcf3181d049649b6648d7c1c708a1294a0a71.html)时,发现第二页的字体文件[font_csss](https://wkretype.bdimg.com/retype/pipe/967dcf3181d049649b6648d7c1c708a1294a0a71?pn=2&t=ttf&rn=1&v=6&md5sum=9441fec5acb7cb05f0ffaabb2103b9dc&range=54694-&sign=c1bb0c9bba)无法获取。后来我做了两点改进解决了这个问题: 第一点,通过抓取我发现字体文件的url不对 首先在这里获取coverUrl的ID,`cover = re.search(r'https://wkimg.bdimg.com/img/(.*?)\?', html).group(1)` https://github.com/BoyInTheSun/wks/blob/b2ece163e1f0bee505d81f6f751ef7afef85f324/main.py#L105 然后把这里的temp_dir改为cover https://github.com/BoyInTheSun/wks/blob/b2ece163e1f0bee505d81f6f751ef7afef85f324/main.py#L166 最后得到可用的url,[font_csss](https://wkretype.bdimg.com/retype/pipe/f1c1c7c10740be1e640e9a81?pn=2&t=ttf&rn=1&v=6&md5sum=9441fec5acb7cb05f0ffaabb2103b9dc&range=54694-&sign=c1bb0c9bba) 第二点,我发现urllib无法正常获取这个url的数据,换成requests就可以获取了 ```python page = requests.get(url=fonts_csss[pagenums[i]], headers=headers) raw = page.text ``` 修改之后,就可以完美下载整个文件了,感觉requests比urllib舒服多了