scrapy-redis
scrapy-redis copied to clipboard
[Question] Fetch request url from redis fail
Description
If i insert start url to redis before run scrapy, is successful.
But if i run scrapy first and insert url, listen url will get fail info:
2023-08-13 17:11:59 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <TestHtmlSpider 'test_html' at 0x2b05c4162d0>>
Traceback (most recent call last):
File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\signal.py", line 43, in send_catch_log
response = robustApply(
^^^^^^^^^^^^
File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 208, in spider_idle
self.schedule_next_requests()
File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 197, in schedule_next_requests
self.crawler.engine.crawl(req, spider=self)
TypeError: ExecutionEngine.crawl() got an unexpected keyword argument 'spider'
I can't get url dynamically and scrapy will crush.
Same error... Found a solution?
Passing a spider argument to the crawl() methods of scrapy.core.engine.ExecutionEngine is no longer supported in scrapy v2.10.0. release notes
Try scrapy 2.9.0.
It looks like pull request https://github.com/rmax/scrapy-redis/pull/286 that fix this already exist from Aug.
This can be easily applied for app with current scrapy-redis
version by.. overriding schedule_next_request
method.
class SomeSpider(RedisSpider):
## vvv _add this to spider code
def schedule_next_requests(self):
"""Schedules a request if available"""
# TODO: While there is capacity, schedule a batch of redis requests.
for req in self.next_requests():
self.crawler.engine.crawl(req, spider=self)
# see https://github.com/scrapy/scrapy/issues/5994
if scrapy_version >= (2, 6):
self.crawler.engine.crawl(req)
else:
self.crawler.engine.crawl(req, spider=self)
hope the fixed version quickly release
@rmax would it be possible to release a fix for this? I'm also encountering this issue
The same problem...
@rmax would it be possible to release a fix for this? I'm also encountering this issue。 Thanks
Thank you for your patience. V0.8.0 has been released 🎉