newsler
newsler copied to clipboard
exceptions.AttributeError: 'NewsSpider' object has no attribute '_rules'
crapy crawl NewsSpider -a src_json=sources/sample.json
exceptions.AttributeError: 'NewsSpider' object has no attribute '_rules'
how to fix this ?
@codepython can you share whole traceback?
@rahulrrixe My command is scrapy crawl NewsSpider -a src_json=sources/forbes.json . And the traceback is at below:
2015-11-16 12:14:52+0800 [scrapy] INFO: Scrapy 0.24.4 started (bot: scrapybot)
2015-11-16 12:14:52+0800 [scrapy] INFO: Optional features available: ssl, http11
2015-11-16 12:14:52+0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'newscrawler.spiders', 'SPIDER_MODULES': ['newscrawler.spiders'], 'LOG_LEVEL': 'INFO', 'DOWNLOAD_DELAY': 0.25}
2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, RotateUserAgentMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled item pipelines: DuplicatesPipeline, MongoDBPipeline
2015-11-16 12:14:52+0800 [NewsSpider] INFO: Spider opened
2015-11-16 12:14:52+0800 [NewsSpider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-11-16 12:14:53+0800 [NewsSpider] ERROR: Spider error processing <GET http://urlsearch.commoncrawl.org/?q=forbes.com>
Traceback (most recent call last):
File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(_call.args, *_call.kw)
File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/twisted/internet/task.py", line 638, in _tick
taskObj._oneWorkUnit()
File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
result = next(self._iterator)
File "/Users/peng/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/utils/defer.py", line 57, in
2015-11-16 12:14:53+0800 [NewsSpider] INFO: Closing spider (finished) 2015-11-16 12:14:53+0800 [NewsSpider] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 237, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2236, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2015, 11, 16, 4, 14, 53, 947818), 'log_count/ERROR': 1, 'log_count/INFO': 7, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'spider_exceptions/AttributeError': 1, 'start_time': datetime.datetime(2015, 11, 16, 4, 14, 52, 381394)} 2015-11-16 12:14:53+0800 [NewsSpider] INFO: Spider closed (finished)
if I run scrapy crawl gooseSpider -a src_json=sources/forbes.json ,it reports: KeyError: 'Spider not found: gooseSpider'
I can not find where is wrong?
I have similar problem, where these _rules are defined in scraps? or we should manage them ourselves?
I had the same issue. Anyone resolved it?
I need to resolve this issue. As there has been a lot of changes with Scrapy. Will take a week.
The following worked for me (adding "super" call in spider init):
def __init__(self, *a, **kw):
super(NewsSpider, self).__init__(*a, **kw)
You can check your code, maybe you are writing less the "self" when you call the parent class.
def init(self, *a, **kw):
super(NewsSpider, self).init(*a, **kw)
I have the same problem, for this reason, I hope to help you.