seCrawler icon indicating copy to clipboard operation
seCrawler copied to clipboard

Help w/ Errors

Open senrabdet opened this issue 6 years ago • 0 comments

Hi there:

Am hoping this post is being monitored even if a bit old...:). I'm trying to use it, and having some issues that I suspect are related to using python 3x instead of 2x, but am unsure. Looks pretty elegant and powerful, so am hoping for help:), but am wondering too whether I need an approach that takes google fighting automated searches like this into account.

E.g., the print statement in searchEngines.py throws an error that I think requires adding () so it becomes print("total page:{0}".format(self.totalPage))....I changed the import and added "from seCrawler.common.searchEngines import SearchEngines" in searchResultPages.py".

Starting the crawler from the readme, scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50 (without or without the ''', so same result with scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50), throws and error:

File "/spiders/keywordSpider.py", line 21, in init for url in pageUrls: TypeError: iter() returned non-iterator of type 'searResultPages'

I'm seeing some posts around with python 3, need to use def next(self): instead of next(), but I don't believe there is a next() statement anywhere in the code. The " init" file in the spider folder is essentially commented out/blank.

My guess is this has something to do with turning number of pages into an integer, and then iterating through whatever that value is (50 as the default).

Q: if anyone notices this, suggestions how how to tweak code to make it work? Am I behind the times, and all search engines robots.txt forbid what I'm trying to do?

FYI I don't think this is related to which search engine is being used....as crawl command throws same error whether using google, bing, or baidu.

Thx

senrabdet avatar Aug 30 '18 13:08 senrabdet