scrapy-cluster
scrapy-cluster copied to clipboard
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Adding support for 1.Custom Headers and Cookies with Initial request 2.Shared cookies middleware to share cookies between crawl nodes Linked Issue #182
I needed to request an URL with custom header and preset cookies. eg. There is an API at `https://xyz.com/test_api/_id` which returns a json. and this should be called with api...
Uitest
Added UI testing using selenium and python
Checklist for items that I know need worked on before the [ui](https://github.com/istresearch/scrapy-cluster/tree/ui) branch can be merged into the [dev](https://github.com/istresearch/scrapy-cluster/tree/dev) branch - [x] Create documentation - [x] Add offline unit tests...
Lots of the individual components break down or crash when their required infrastructure is not available. They are dependent on kafka, redis, or zookeeper, but don't have good mechanisms always...
Upgrade the project to python 3.10
I ran the Scrapy Cluster spider start code and I ended up getting this error message, I have no idea what this could be and have troubleshooted for a while....