CNKI-download
CNKI-download copied to clipboard
:frog: 知网(CNKI)文献下载及文献速览爬虫
在校外,没有公网的网址,需要用用户名和账号登陆才能下载,这个咋搞啊
RT
应该时没搜到 主题词 关键词等单一搜索都是报这个错 Traceback (most recent call last): File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 246, in main() File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 240, in main search.search_reference(get_uesr_inpt()) File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 87, in search_reference second_get_res.text).group(1) AttributeError: 'NoneType'...
实现了学校ip的知网登录但下载文献需要验证码(**每一篇都要**),真实的浏览器(selenium驱动浏览器也每篇都要验证码)请求可以直接下载到文献,是少量什么参数还是什么? 看了下CNKI-download的文献下载部分只是简单的get请求加了headers是一个404 ``` import requests headers = { 'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Accept-Language':...
验证码出问题
采了一百多篇,验证码出问题,提示: ERROR:root:出现验证码 Traceback (most recent call last): File "main.py", line 144, in parse_page tr_table.tr.extract() AttributeError: 'NoneType' object has no attribute 'tr' 用不用识别码都报错,请帮忙看一下是什么原因?谢谢
ocr识别出问题
### 问题描述 直接fork到的代码不是直接能用的 然后修改了一下 ```python def depoint(self, img): """传入二值化后的图片进行降噪""" pixdata = img.load() w, h = img.size for y in range(1, h - 1): for x in range(1, w - 1):...
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 概览页 function ShowGroup(p1, p2, p3) { return parent.ShowGroup(p1, p2, p3); } $(document).ready(function () { qkInfoCall(); setAuShow(); // GetHeat(); window.parent.HideWaitDiv(); SetFrameHeight(); isHasAddFav(); try{...
下载代码
main.py 218 行 refence_file = requests.get(self.download_url, headers=HEADER) 改为: refence_file = self.session.get(self.download_url) ?
知网反爬
知网改了网页源代码,将搜索后包含内容的进行了隐藏,爬取的网页源代码中无检索的结果,报错: Traceback (most recent call last): File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 263, in main() File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 257, in main search.search_reference(get_uesr_inpt()) File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 100, in search_reference self.pre_parse_page(second_get_res.text), second_get_res.text)...
Bumps [pillow](https://github.com/python-pillow/Pillow) from 5.3.0 to 9.3.0. Release notes Sourced from pillow's releases. 9.3.0 https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html Changes Initialize libtiff buffer when saving #6699 [@radarhere] Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [@wiredfool]...