SpiderKeeper icon indicating copy to clipboard operation
SpiderKeeper copied to clipboard

Scrapy Realtime Execution

Open netconstructor opened this issue 8 years ago • 2 comments

I have a brief question: I have been looking for some type of reliable method which would allow specific scrapy spiders to be assigned to a "realtime" style method of execution. More specifically: I often run into situations in which I find myself wanting to create a scrapy spider and calling it on demand (along with passing variables if needed) with the simple purpose of retrieving data in realtime for insertion into a website.

It seems that this "realtime" capability is not addressed within your admin approach (as well as all other scrapy admin approaches I have reviewed). Instead it seems that all such admin related management capabilities are always targeted at managing scrapy extractions as "bulk" jobs/runs.

I would great appreciate if someone could address this topic as well as the feasibility of extending this script to deal with such functionality in a reliable and effective way.

Any comments or known issues/limitations others might have experienced in delivering such a solution would be very much appreciated.

Chris H.

netconstructor avatar Apr 13 '17 18:04 netconstructor

may be you can view the swagger doc http://localhost:5000/api.html , you will find the api to run the spider

english is not my language ^ ^

DormyMo avatar Apr 14 '17 12:04 DormyMo

@netconstructor , I' have just discovered SpiderKeeper so I don't know what is provided here regarding your use case. But you can checkout https://github.com/scrapinghub/scrapyrt:

You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

redapple avatar Apr 14 '17 13:04 redapple