SpiderKeeper
SpiderKeeper copied to clipboard
Scrapy Realtime Execution
I have a brief question: I have been looking for some type of reliable method which would allow specific scrapy spiders to be assigned to a "realtime" style method of execution. More specifically: I often run into situations in which I find myself wanting to create a scrapy spider and calling it on demand (along with passing variables if needed) with the simple purpose of retrieving data in realtime for insertion into a website.
It seems that this "realtime" capability is not addressed within your admin approach (as well as all other scrapy admin approaches I have reviewed). Instead it seems that all such admin related management capabilities are always targeted at managing scrapy extractions as "bulk" jobs/runs.
I would great appreciate if someone could address this topic as well as the feasibility of extending this script to deal with such functionality in a reliable and effective way.
Any comments or known issues/limitations others might have experienced in delivering such a solution would be very much appreciated.
Chris H.
may be you can view the swagger doc http://localhost:5000/api.html , you will find the api to run the spider
english is not my language ^ ^
@netconstructor , I' have just discovered SpiderKeeper so I don't know what is provided here regarding your use case. But you can checkout https://github.com/scrapinghub/scrapyrt:
You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.