scrapy-cluster icon indicating copy to clipboard operation
scrapy-cluster copied to clipboard

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Results 17 scrapy-cluster issues
Sort by recently updated
recently updated
newest added

Adding support for 1.Custom Headers and Cookies with Initial request 2.Shared cookies middleware to share cookies between crawl nodes Linked Issue #182

I needed to request an URL with custom header and preset cookies. eg. There is an API at `https://xyz.com/test_api/_id` which returns a json. and this should be called with api...

crawler
feature request

Added UI testing using selenium and python

Checklist for items that I know need worked on before the [ui](https://github.com/istresearch/scrapy-cluster/tree/ui) branch can be merged into the [dev](https://github.com/istresearch/scrapy-cluster/tree/dev) branch - [x] Create documentation - [x] Add offline unit tests...

help wanted
documentation
unit testing
ui

Lots of the individual components break down or crash when their required infrastructure is not available. They are dependent on kafka, redis, or zookeeper, but don't have good mechanisms always...

crawler
kafka-monitor
redis-monitor
unit testing

Upgrade the project to python 3.10

I ran the Scrapy Cluster spider start code and I ended up getting this error message, I have no idea what this could be and have troubleshooted for a while....