seCrawler
seCrawler copied to clipboard
Help w/ Errors
Hi there:
Am hoping this post is being monitored even if a bit old...:). I'm trying to use it, and having some issues that I suspect are related to using python 3x instead of 2x, but am unsure. Looks pretty elegant and powerful, so am hoping for help:), but am wondering too whether I need an approach that takes google fighting automated searches like this into account.
E.g., the print statement in searchEngines.py throws an error that I think requires adding () so it becomes print("total page:{0}".format(self.totalPage))....I changed the import and added "from seCrawler.common.searchEngines import SearchEngines" in searchResultPages.py".
Starting the crawler from the readme, scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50
(without or without the ''', so same result with scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50
), throws and error:
File "/spiders/keywordSpider.py", line 21, in init for url in pageUrls: TypeError: iter() returned non-iterator of type 'searResultPages'
I'm seeing some posts around with python 3, need to use def next(self): instead of next(), but I don't believe there is a next() statement anywhere in the code. The " init" file in the spider folder is essentially commented out/blank.
My guess is this has something to do with turning number of pages into an integer, and then iterating through whatever that value is (50 as the default).
Q: if anyone notices this, suggestions how how to tweak code to make it work? Am I behind the times, and all search engines robots.txt forbid what I'm trying to do?
FYI I don't think this is related to which search engine is being used....as crawl command throws same error whether using google, bing, or baidu.
Thx