spidermon icon indicating copy to clipboard operation
spidermon copied to clipboard

Action to restart spider in Scrapy Cloud

Open andrewbaxter opened this issue 6 years ago • 4 comments

Discussed in chat a bit - the idea is that if a job meets some conditions (monitor detects certain website responses, the job stalls, etc) this action could restart the job.

Ideas for how to count restarts to prevent infinite restarting included:

  • job metadata
  • spider parameter
  • tag the job + restarts with a uuid or something and look up the previous job(s) with the tag to get the restart count

andrewbaxter avatar Mar 14 '19 19:03 andrewbaxter

Should we create this coupling of Spidermon and Scrapy cloud?

ejulio avatar Mar 18 '19 12:03 ejulio

@rennerocha said he didn't see a problem so I made this ticket, but I may have misread his response. I have no strong opinions either way. Since it's a common use case at the moment I think it would be more convenient but I can understand wanting to limit the amount of dependencies.

andrewbaxter avatar Mar 18 '19 13:03 andrewbaxter

This could work out well but what exactly is the use case here? Something like restarting after a particular amount of time in specific circumstances?

vipulgupta2048 avatar Apr 06 '19 02:04 vipulgupta2048

I'm not sure there's a specific use case, but this comes up every once in a while. Sometimes jobs get stuck for some reason and restarting them gets them going again, or the website upstream starts having issues which corrupt the data and it's not worth the effort of trying to recover.

andrewbaxter avatar Apr 08 '19 15:04 andrewbaxter