morph Scrapers with stopped containers can't be manually stopped & don't get restarted

Reported by @jennahowe:

The 'stop scraper' button isn't doing anything.

It's been running for about a day now https://morph.io/jennahowe/El_Salvador_Legislative_Assembly, so I guess it will be killed automatically soon when it goes over 24 hours.

It should have just stopped of it's own accord ages ago because it errored pretty much straight away (bad gateway error), but that hasn't happened.

I found a similar PlanningAlerts scraper that had been "running" for 2 months. It had a stopped container so I:

r = Morph::Runner.new(Run.find(132141))
r.go_with_logging

That connected to the stopped container and finished the run. Now the scraper is working again.

How should we fix this in the long term?

Sep 16 '15 23:09 henare

I just noticed that because these running scrapers never error they don't even show up in your alert email :-/

Sep 17 '15 01:09 henare

This is affecting ~35 PlanningAlerts scrapers (out of 115) so we might want to fix this sooner rather than later :boom: Or at least we should manually do what I did above.

Sep 21 '15 07:09 henare

The problematic PlanningAlerts ones have their last_run marked as queued but they're not in the actual queue. Many of these are "queued" 6 days ago.

Sep 21 '15 07:09 henare

They're not isolated:

Run.where(finished_at: nil, started_at: nil).where('queued_at IS NOT NULL').count
 => 131

Sep 21 '15 07:09 henare

This:

Run.where(finished_at: nil, started_at: nil).where('queued_at IS NOT NULL').each { |r| RunWorker.perform_async(r.id) }

Seems to have cleared those ones that were previously sitting at the bottom of the PlanningAlerts list.

Sep 21 '15 08:09 henare

We have two issues here. The issue I reported initially is different to the one I subsequently worked on above.

The initially reported one was where there was a stopped container for a "running" scraper. Since half a dozen or so PlanningAlerts ones were still in this state, I ran the following code to try and clear out that backlog:

containers = Morph::DockerUtils.stopped_containers
containers = containers.select { |c| Morph::Runner.run_id_for_container(c) }

containers.each do |container|
  run = Morph::Runner.run_for_container(container)
  RunWorker.perform_async(run.id) if run.running?
end

It seems to be working, and uncovering all sorts of other errors that we probably need to handle (such as when scrapers are deleted, or there's a problem opening the DB).

Sep 22 '15 02:09 henare

https://morph.io/tmtmtmtm/bulgaria-parliament is also currently unstoppable, and has been stuck for 4 days. I'd rather not have to delete and reimport to start it again…

Oct 14 '15 06:10 tmtmtmtm

https://morph.io/tmtmtmtm/poland-sejm-wikidata has also been stuck for several days.

Oct 14 '15 11:10 tmtmtmtm

@tmtmtmtm you were spot on about guessing what the problem was. I've manually fixed these.

I'm still not sure what the long term fix for this is.

Oct 14 '15 23:10 henare

Thanks @henare — is there some way you expose the ability for users to more completely 'kill' a stuck scraper, when stopping it doesn't work? A few times I've been frustrated enough to simply delete and reimport, but that's far from ideal, especially if a database is storing historic records. It would, of course, be ideal if scrapers never actually got stuck, but being able to work around it when it happens would be the next best thing :)

Oct 15 '15 06:10 tmtmtmtm

Yeah I think we need something here that handles this case but I'm just not sure what, yet :wink: :thought_balloon:

Oct 15 '15 06:10 henare

hello, stop also this one please https://morph.io/soit-sk/trademarks_-_upv_sk

Oct 27 '15 11:10 katkad

@katkad I've kicked off the stuck run and it's trying to finish now. If it's still a problem, can you please post a message in the help forum? https://help.morph.io/ [Update: Sorry, I just noticed you tried to but the post needed approval for some reason]

Oct 28 '15 00:10 henare

Of late there's been a build up of scrapers sitting on this page: https://morph.io/scrapers/running with no corresponding running job in Sidekiq. These scrapers can never finish and never be stopped since they usually have a container but no corresponding watch process.

Until we get time to fix this properly or someone else comes up with a solution, from time to time I've been running the following. It creates a watch process for all "running" scrapers. The IDs in it are the job IDs already with a Sidekiq job (we don't want to create a duplicates):

Scraper.running.each { |s| RunWorker.perform_async(s.last_run.id) unless [202723, 202726].include?(s.last_run.id) }

Nov 05 '15 20:11 henare

Just wanted to check in to see what's up with this issue and whether it has been resolved. My scraper is currently not responding either! https://morph.io/Charrod/CSGOTeamBotScraper

Feb 15 '16 16:02 CharlieIO

@Charrod my last comment above is still the latest on this issue :( It would be so good to get a fix in for this as it continues to affect a few scrapers here and there, such as yours. I've run the above command and you scraper is moving again.

Feb 16 '16 23:02 henare

@henare Sorry to bother you, but it appears my scraper is yet again stuck. Is this a fault of my code or some glitch within Morph?

Feb 24 '16 01:02 CharlieIO

@Charrod if you can't stop your scraper it's a sure sign of this bug and not your scraper. Maybe you'd like to have a go at fixing this issue or becoming a supporter to continue the development and operation of morph.io?

I think one of the reasons your scraper has been affected by this issue is because you're outputting tens of thousands of lines (43,110 lines in the current run). Of course this shouldn't matter but until this bug is fixed it could help to not output so many lines on your scraper.

Anyway, I've got your scraper running again. If you have similar problems like this in the future, please post them to the help forum.

Feb 24 '16 01:02 henare

I just needed to test this and this is the rough process I went though:

Start scraper
Wait for a container to boot
[make job crash] I think I killed foreman while it was running
Delete job from retry
Start everything back up
Job is stuck and can't be stopped

Apr 06 '16 03:04 henare