spidermon
spidermon copied to clipboard
Action to restart spider in Scrapy Cloud
Discussed in chat a bit - the idea is that if a job meets some conditions (monitor detects certain website responses, the job stalls, etc) this action could restart the job.
Ideas for how to count restarts to prevent infinite restarting included:
- job metadata
- spider parameter
- tag the job + restarts with a uuid or something and look up the previous job(s) with the tag to get the restart count
Should we create this coupling of Spidermon and Scrapy cloud?
@rennerocha said he didn't see a problem so I made this ticket, but I may have misread his response. I have no strong opinions either way. Since it's a common use case at the moment I think it would be more convenient but I can understand wanting to limit the amount of dependencies.
This could work out well but what exactly is the use case here? Something like restarting after a particular amount of time in specific circumstances?
I'm not sure there's a specific use case, but this comes up every once in a while. Sometimes jobs get stuck for some reason and restarting them gets them going again, or the website upstream starts having issues which corrupt the data and it's not worth the effort of trying to recover.