morph icon indicating copy to clipboard operation
morph copied to clipboard

Backend throws Docker::Error::NotFoundError errors occasionally since recent upgrades

Open henare opened this issue 9 years ago • 5 comments

It's happening at the point in code where morph.io is trying to attach to a stopped container and finish the run.

[Morph/production] Docker::Error::NotFoundError: open /var/lib/docker/containers/1bdb967ce0f3933068d77cf3d1b93411f29ec6067d556b30f555464c3778894d/1bdb967ce0f3933068d77cf3d1b93411f29ec6067d556b30f555464c3778894d-json.log: no such file or directory

Backtrace

line 118 of [PROJECT_ROOT]/lib/morph/docker_runner.rb: attach_to_run_and_finish
line 103 of [PROJECT_ROOT]/lib/morph/runner.rb: attach_to_run_and_finish
line 50 of [PROJECT_ROOT]/lib/morph/runner.rb: go

View full backtrace and more info at honeybadger.io

henare avatar Jul 22 '16 04:07 henare

There's a 100 failed jobs on the queue right now with most of them due to this error. So, definitely needs fixing.

mlandauer avatar Jul 27 '16 08:07 mlandauer

This is what I've figured out so far. For containers where that error is occurring, they're all marked as having status "dead". When you go into the /var/lib/docker/containers directory there is no directory for that container. So, what it looks like is something is going wrong with the container and it's not getting cleaned up fully. The container data is not there but there is still some reference to it in docker.

The simple workaround for the time being is to simply remove the dead containers and rerun the jobs and that seems to clear things out.

We'll need a bit more of a clue as to why this is happening in the first place

mlandauer avatar Jul 27 '16 17:07 mlandauer

Yesterday I upgraded the docker server again. The problem seems to have gone away as far as I can tell now. Let's leave it for a few more days

mlandauer avatar Aug 03 '16 02:08 mlandauer

It's still happening occasionally :-(

mlandauer avatar Aug 07 '16 17:08 mlandauer

This is a big contributing factor to #1098. If it happens to a job it takes up a slot and just keeps retrying and will never finish. So if you get a few like this then slots just start filling up.

henare avatar Nov 17 '16 04:11 henare