AadhaarSearchEngine
AadhaarSearchEngine copied to clipboard
Issue With the imports in SearchResultPages.py file
I have replaced the code in SearchResultPages.py with the data in SearchResultPages.py file My updated code looks like this ` SearchEngines = { 'google': 'https://www.google.com/search?q={0}&start={1}', 'bing': 'http://www.bing.com/search?q={0}&first={1}', 'baidu': 'http://www.baidu.com/s?wd={0}&pn={1}' }
SearchEngineResultSelectors= { 'google': '//h3/a/@href', 'bing': '//h2/a/@href', 'baidu': '//h3/a/@href', }
class SearchResultPages: totalPage = 0 keyword = None, searchEngineUrl = None currentPage = 0 searchEngine = None
def __init__(self, keyword, search_engine, total_page):
self.searchEngine = search_engine.lower()
self.searchEngineUrl = SearchEngines[self.searchEngine]
self.totalPage = total_page
self.keyword = keyword
def __iter__(self):
return self
def _currentUrl(self):
return self.searchEngineUrl.format(self.keyword, str(self.currentPage * 10))
def next(self):
if self.currentPage < self.totalPage:
url = self._currentUrl()
self.currentPage = self.currentPage + 1
return url
raise StopIteration
`
But still, I'm getting an error message
` scrapy crawl AadhaarSpider -a keyword="aadhaar meri pehachan filetype:pdf" -a se=google -a pages=10 2019-10-04 01:47:30 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: AadhaarSearchEngine) 2019-10-04 01:47:30 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.7.0, Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.1c 28 May 2019), cryptography 2.4.2, Platform Windows-10-10.0.18362-SP0 2019-10-04 01:47:30 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'AadhaarSearchEngine', 'DEPTH_LIMIT': 1, 'DOWNLOAD_DELAY': 30, 'NEWSPIDER_MODULE': 'AadhaarSearchEngine.spiders', 'SPIDER_MODULES': ['AadhaarSearchEngine.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36'} 2019-10-04 01:47:31 [scrapy.extensions.telnet] INFO: Telnet Password: 1fbab3f040eebf68 2019-10-04 01:47:32 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] Unhandled error in Deferred: 2019-10-04 01:47:32 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "c:\users\bordi\anaconda3\lib\site-packages\scrapy\crawler.py", line 184, in crawl
return self._crawl(crawler, *args, **kwargs)
File "c:\users\bordi\anaconda3\lib\site-packages\scrapy\crawler.py", line 188, in _crawl
d = crawler.crawl(*args, **kwargs)
File "c:\users\bordi\anaconda3\lib\site-packages\twisted\internet\defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "c:\users\bordi\anaconda3\lib\site-packages\twisted\internet\defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
---
2019-10-04 01:47:32 [twisted] CRITICAL: Traceback (most recent call last): File "c:\users\bordi\anaconda3\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "c:\users\bordi\anaconda3\lib\site-packages\scrapy\crawler.py", line 85, in crawl self.spider = self._create_spider(*args, **kwargs) File "c:\users\bordi\anaconda3\lib\site-packages\scrapy\crawler.py", line 108, in create_spider return self.spidercls.from_crawler(self, *args, **kwargs) File "c:\users\bordi\anaconda3\lib\site-packages\scrapy\spiders_init.py", line 50, in from_crawler spider = cls(*args, **kwargs) File "C:\Users\bordi\Desktop\pr2\AadhaarSearchEngine\spiders\AadhaarSpider.py", line 24, in init for url in page_urls: TypeError: iter() returned non-iterator of type 'SearchResultPages' `