CNKICrawler 爬虫使用问题

您好，我是一名菜鸟，想利用您的爬虫爬取CNKI上关于的内容来进行论文攥写前期的资料我想问下，我在使用您这个代码的时候，发现并没办法执行爬取工作 1.找不到输入想要爬取关键词的位置~后来在search_page里面看到了一个open函数，新建了一个txt文档，输入了关键词但是还是不能用 2.spider_paper里面 request库提示没引用然后说title也没用到，这个有影响么？？ 3.执行之后马上结束了，我想问下这个是为什么呢？麻烦您啦~

Jun 10 '17 11:06 tiantian0417

你好！你可以在Config.conf文件中的keyword处配置关键词后运行spider_main进行尝试，另外你可以把报错发一下给我，这样我可以帮助你解决问题。

Jun 10 '17 11:06 qiuqingyu

并未报错，但是并不执行爬取的感觉。。没有更新Excel表或者生成Excel表。。那如果是config中设定关键词的话，请问一下 search_page里面的那个open函数是做什么的呢？

Jun 10 '17 11:06 tiantian0417

请问下能给个您的联系方式么？感觉这个沟通有些。。呃。。慢

Jun 10 '17 12:06 tiantian0417

那个open函数是用来打开缓存文件的。这个爬虫现在搜索页面按照给定的关键词爬取文章信息，将这些信息存在缓存文件data-detail.txt中。open函数是python中打开文件的方法。我的邮箱是：[email protected]

Jun 10 '17 12:06 qiuqingyu

当修改config中的keyword后，出现如下报错MissingSectionHeaderError: File contains no section headers. file: 'Config.conf', line: 1 '\ufeff[base]\n'

Jun 10 '17 12:06 tiantian0417

这个问题是编码问题，需要用utf-8无BOM方式对config文件进行保存

Jun 10 '17 12:06 qiuqingyu

呃，大神，我比较菜，我想问下config里面的参数都是做什么的呢？

Jun 10 '17 12:06 tiantian0417

[base] keyword = 慢性粒细胞白血病 #搜索关键词 currentpage = 0 #保存当前搜索页数 maxpage = 1 #最大搜索页数 searchlocation = 全文 #搜索模式，有全文、主题、篇名、作者、摘要五种模式

Jun 10 '17 12:06 qiuqingyu

呃大神，又出现报错了 TypeError: catching classes that do not inherit from BaseException is not allowed

Jun 10 '17 12:06 tiantian0417

您好，可以给出报错的具体语句么？因为之前没有遇到过这个问题

Jun 10 '17 13:06 qiuqingyu

File "E:/新建文件夹/CNKICrawler-master/CNKICrawler-master/spider_main.py", line 65, in except urllib.error:

TypeError: catching classes that do not inherit from BaseException is not allowed if attempts == 50: break except urllib.error: #就是此句 attempts += 1 print(("第"+str(attempts)+"次重试！！"))

Jun 10 '17 13:06 tiantian0417

这句不应该有问题啊，你是用的Python3么？

Jun 10 '17 13:06 qiuqingyu

是的，我是用的Python3

Jun 10 '17 13:06 tiantian0417

大神…现在就是运行之后，没有什么反应了

Jun 12 '17 02:06 tiantian0417

没有反应是没有结果吗？控制台也没有输出吗？

Jun 12 '17 02:06 qiuqingyu

是的。没有反应。。没有输出。。我好方

Jun 12 '17 03:06 tiantian0417

额....要不你需要爬什么，我帮你爬下来发给你好了

Jun 12 '17 08:06 qiuqingyu

呃，是这样的，我是想爬我们写论文的内容，但是因为我是写综述，所以可能爬的东西比较多，而且我也想学一学怎么写的这个爬虫…所以…

Jun 12 '17 08:06 tiantian0417

我在线上部署了一个爬虫，你可以使用看下，能不能满足你的要求。http://www.qiuqingyu.cn/todolist/

Jun 15 '17 07:06 qiuqingyu

请问一下你这个爬虫在爬取了800多条之后就会一直重复爬取已经爬过的内容怎么解决呢？

Jan 09 '18 07:01 RYFan-RS

在写这个爬虫的时候并没有考虑去重的问题，不好意思

Jan 09 '18 08:01 qiuqingyu

换个网站怎么不能用呢，，，现在这个网站访问不了

Jan 29 '18 07:01 john-ogden

大神这是什么问题导致的，求解急需在线等！sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'

Jul 11 '18 14:07 Cairang

pagesum_text = soup.find('span', class_='page-sum').get_text() AttributeError: 'NoneType' object has no attribute 'get_text'

Dec 09 '19 02:12 dockerwang

大神这是什么问题导致的，求解急需在线等！sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'

我也遇到了这个问题，请问怎么解决呢？

Apr 12 '20 12:04 zhangmengna151822

我在线上部署了一个爬虫，你可以使用看下，能不能满足你的要求。http://www.qiuqingyu.cn/todolist/

大神，这个网址打不开

Apr 12 '20 12:04 zhangmengna151822

大神这是什么问题导致的，求解急需在线等！sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'

我也遇到了这个问题，请问怎么解决呢？

我也遇到了这个问题，请问怎么解决呢？

Oct 18 '20 09:10 caichen1234567

CNKICrawler CNKICrawler copied to clipboard

爬虫使用问题

CNKICrawler
CNKICrawler copied to clipboard