R Max Espinoza
R Max Espinoza
When you use the `RedisSpider` class it overrides the start urls to read them from redis, so you have to push urls to `dmoz:start_urls` key, for example: ``` redis-cli lpush...
By the way, using `RedisSpider` class is optional. If you set the scheduler settings and don't use `RedisSpider` the requests will be sent through redis and you can start additional...
@rafaelcapucho I added this card to address this issue https://github.com/rolando/scrapy-redis/issues/61 `DUPEFILTER_CLASS` is always defined https://github.com/scrapy/scrapy/blob/master/scrapy/settings/default_settings.py#L113 But I agree that it's bad to allow the use of the default dupefilter. Now...
It seems you are not consuming the messages fast enough. Could you monitor and share the size of the redis keys over time? Also, are you sending items through redis?...
@liuyuer could you expand on your use case? I like to recycle processes so memory doesn't pile up over time. You could have make you crawler to close after being...
Sounds perfect. Please take the lead! @LuckyPigeon has been given permissions to the repo.
I think this could be improved by having a background thread that keeps a buffer of urls to feed the scrapy scheduler when there is capacity. The current approach relies...
@whg517 thanks for the initiative. Could you also include the pros and cons of moving the project to scrapy-plugins org?
This could be a dupefilter class.
@kmike good point!