SpiderKeeper Scrapy Realtime Execution

Scrapy Realtime Execution

Open netconstructor opened this issue 8 years ago • 2 comments

I have a brief question: I have been looking for some type of reliable method which would allow specific scrapy spiders to be assigned to a "realtime" style method of execution. More specifically: I often run into situations in which I find myself wanting to create a scrapy spider and calling it on demand (along with passing variables if needed) with the simple purpose of retrieving data in realtime for insertion into a website.

It seems that this "realtime" capability is not addressed within your admin approach (as well as all other scrapy admin approaches I have reviewed). Instead it seems that all such admin related management capabilities are always targeted at managing scrapy extractions as "bulk" jobs/runs.

I would great appreciate if someone could address this topic as well as the feasibility of extending this script to deal with such functionality in a reliable and effective way.

Any comments or known issues/limitations others might have experienced in delivering such a solution would be very much appreciated.

Chris H.

Apr 13 '17 18:04 netconstructor

may be you can view the swagger doc http://localhost:5000/api.html , you will find the api to run the spider

english is not my language ^ ^

Apr 14 '17 12:04 DormyMo

@netconstructor , I' have just discovered SpiderKeeper so I don't know what is provided here regarding your use case. But you can checkout https://github.com/scrapinghub/scrapyrt:

You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

Apr 14 '17 13:04 redapple

SpiderKeeper SpiderKeeper copied to clipboard

Scrapy Realtime Execution

SpiderKeeper
SpiderKeeper copied to clipboard