R Max Espinoza comments

Results 56 comments of


                                            R Max Espinoza

TypeError: init() got an unexpected keyword argument 'server'

When you use the `RedisSpider` class it overrides the start urls to read them from redis, so you have to push urls to `dmoz:start_urls` key, for example: ``` redis-cli lpush...

TypeError: init() got an unexpected keyword argument 'server'

By the way, using `RedisSpider` class is optional. If you set the scheduler settings and don't use `RedisSpider` the requests will be sent through redis and you can start additional...

TypeError: init() got an unexpected keyword argument 'server'

@rafaelcapucho I added this card to address this issue https://github.com/rolando/scrapy-redis/issues/61 `DUPEFILTER_CLASS` is always defined https://github.com/scrapy/scrapy/blob/master/scrapy/settings/default_settings.py#L113 But I agree that it's bad to allow the use of the default dupefilter. Now...

Recommended redis configuration?

It seems you are not consuming the messages fast enough. Could you monitor and share the size of the redis keys over time? Also, are you sending items through redis?...

When does the crawler stop

@liuyuer could you expand on your use case? I like to recycle processes so memory doesn't pile up over time. You could have make you crawler to close after being...

[RFC]A new journey

Sounds perfect. Please take the lead! @LuckyPigeon has been given permissions to the repo.

[RFC]A new journey

I think this could be improved by having a background thread that keeps a buffer of urls to feed the scrapy scheduler when there is capacity. The current approach relies...

[RFC]A new journey

@whg517 thanks for the initiative. Could you also include the pros and cons of moving the project to scrapy-plugins org?

Add example integration with pyrebloom

This could be a dupefilter class.

Add example integration with pyrebloom

@kmike good point!