Scrapers with stopped containers can't be manually stopped & don't get restarted
Reported by @jennahowe:
The 'stop scraper' button isn't doing anything.
It's been running for about a day now https://morph.io/jennahowe/El_Salvador_Legislative_Assembly, so I guess it will be killed automatically soon when it goes over 24 hours.
It should have just stopped of it's own accord ages ago because it errored pretty much straight away (bad gateway error), but that hasn't happened.
I found a similar PlanningAlerts scraper that had been "running" for 2 months. It had a stopped container so I:
r = Morph::Runner.new(Run.find(132141))
r.go_with_logging
That connected to the stopped container and finished the run. Now the scraper is working again.
How should we fix this in the long term?
I just noticed that because these running scrapers never error they don't even show up in your alert email :-/
This is affecting ~35 PlanningAlerts scrapers (out of 115) so we might want to fix this sooner rather than later :boom: Or at least we should manually do what I did above.
The problematic PlanningAlerts ones have their last_run marked as queued but they're not in the actual queue. Many of these are "queued" 6 days ago.
They're not isolated:
Run.where(finished_at: nil, started_at: nil).where('queued_at IS NOT NULL').count
=> 131
This:
Run.where(finished_at: nil, started_at: nil).where('queued_at IS NOT NULL').each { |r| RunWorker.perform_async(r.id) }
Seems to have cleared those ones that were previously sitting at the bottom of the PlanningAlerts list.
We have two issues here. The issue I reported initially is different to the one I subsequently worked on above.
The initially reported one was where there was a stopped container for a "running" scraper. Since half a dozen or so PlanningAlerts ones were still in this state, I ran the following code to try and clear out that backlog:
containers = Morph::DockerUtils.stopped_containers
containers = containers.select { |c| Morph::Runner.run_id_for_container(c) }
containers.each do |container|
run = Morph::Runner.run_for_container(container)
RunWorker.perform_async(run.id) if run.running?
end
It seems to be working, and uncovering all sorts of other errors that we probably need to handle (such as when scrapers are deleted, or there's a problem opening the DB).
https://morph.io/tmtmtmtm/bulgaria-parliament is also currently unstoppable, and has been stuck for 4 days. I'd rather not have to delete and reimport to start it again…
https://morph.io/tmtmtmtm/poland-sejm-wikidata has also been stuck for several days.
@tmtmtmtm you were spot on about guessing what the problem was. I've manually fixed these.
I'm still not sure what the long term fix for this is.
Thanks @henare — is there some way you expose the ability for users to more completely 'kill' a stuck scraper, when stopping it doesn't work? A few times I've been frustrated enough to simply delete and reimport, but that's far from ideal, especially if a database is storing historic records. It would, of course, be ideal if scrapers never actually got stuck, but being able to work around it when it happens would be the next best thing :)
Yeah I think we need something here that handles this case but I'm just not sure what, yet :wink: :thought_balloon:
hello, stop also this one please https://morph.io/soit-sk/trademarks_-_upv_sk
@katkad I've kicked off the stuck run and it's trying to finish now. If it's still a problem, can you please post a message in the help forum? https://help.morph.io/ [Update: Sorry, I just noticed you tried to but the post needed approval for some reason]
Of late there's been a build up of scrapers sitting on this page: https://morph.io/scrapers/running with no corresponding running job in Sidekiq. These scrapers can never finish and never be stopped since they usually have a container but no corresponding watch process.
Until we get time to fix this properly or someone else comes up with a solution, from time to time I've been running the following. It creates a watch process for all "running" scrapers. The IDs in it are the job IDs already with a Sidekiq job (we don't want to create a duplicates):
Scraper.running.each { |s| RunWorker.perform_async(s.last_run.id) unless [202723, 202726].include?(s.last_run.id) }
Just wanted to check in to see what's up with this issue and whether it has been resolved. My scraper is currently not responding either! https://morph.io/Charrod/CSGOTeamBotScraper
@Charrod my last comment above is still the latest on this issue :( It would be so good to get a fix in for this as it continues to affect a few scrapers here and there, such as yours. I've run the above command and you scraper is moving again.
@henare Sorry to bother you, but it appears my scraper is yet again stuck. Is this a fault of my code or some glitch within Morph?
@Charrod if you can't stop your scraper it's a sure sign of this bug and not your scraper. Maybe you'd like to have a go at fixing this issue or becoming a supporter to continue the development and operation of morph.io?
I think one of the reasons your scraper has been affected by this issue is because you're outputting tens of thousands of lines (43,110 lines in the current run). Of course this shouldn't matter but until this bug is fixed it could help to not output so many lines on your scraper.
Anyway, I've got your scraper running again. If you have similar problems like this in the future, please post them to the help forum.
I just needed to test this and this is the rough process I went though:
- Start scraper
- Wait for a container to boot
- [make job crash] I think I killed foreman while it was running
- Delete job from retry
- Start everything back up
- Job is stuck and can't be stopped