scrapy-redis icon indicating copy to clipboard operation
scrapy-redis copied to clipboard

[Question] Fetch request url from redis fail

Open KokoTa opened this issue 1 year ago • 8 comments

Description

If i insert start url to redis before run scrapy, is successful.

But if i run scrapy first and insert url, listen url will get fail info:

2023-08-13 17:11:59 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <TestHtmlSpider 'test_html' at 0x2b05c4162d0>>
Traceback (most recent call last):
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\signal.py", line 43, in send_catch_log
    response = robustApply(
               ^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 208, in spider_idle
    self.schedule_next_requests()
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 197, in schedule_next_requests
    self.crawler.engine.crawl(req, spider=self)
TypeError: ExecutionEngine.crawl() got an unexpected keyword argument 'spider'

I can't get url dynamically and scrapy will crush.

KokoTa avatar Aug 13 '23 09:08 KokoTa

Same error... Found a solution?

Shleif91 avatar Sep 04 '23 08:09 Shleif91

Passing a spider argument to the crawl() methods of scrapy.core.engine.ExecutionEngine is no longer supported in scrapy v2.10.0. release notes

Try scrapy 2.9.0.

gc1423 avatar Sep 14 '23 11:09 gc1423

It looks like pull request https://github.com/rmax/scrapy-redis/pull/286 that fix this already exist from Aug. This can be easily applied for app with current scrapy-redis version by.. overriding schedule_next_request method.

class SomeSpider(RedisSpider):
    ## vvv _add this to spider code
    def schedule_next_requests(self):
        """Schedules a request if available"""
        # TODO: While there is capacity, schedule a batch of redis requests.
        for req in self.next_requests():
            self.crawler.engine.crawl(req, spider=self)
            # see https://github.com/scrapy/scrapy/issues/5994
            if scrapy_version >= (2, 6):
                self.crawler.engine.crawl(req)
            else:
                self.crawler.engine.crawl(req, spider=self)


GeorgeA92 avatar Nov 25 '23 19:11 GeorgeA92

hope the fixed version quickly release

xuexingdong avatar Jan 08 '24 12:01 xuexingdong

@rmax would it be possible to release a fix for this? I'm also encountering this issue

jordinl avatar May 16 '24 15:05 jordinl

The same problem...

migrant avatar Jun 18 '24 16:06 migrant

@rmax would it be possible to release a fix for this? I'm also encountering this issue。 Thanks

georgeJzzz avatar Jun 24 '24 02:06 georgeJzzz

Thank you for your patience. V0.8.0 has been released 🎉

rmax avatar Jul 04 '24 06:07 rmax