watchtower icon indicating copy to clipboard operation
watchtower copied to clipboard

Long downtime during restart of multiple containers that are based on the same image

Open Rush opened this issue 5 years ago • 21 comments

Let's say we have 10 containers based on the same image. Upon update watchtower will:

  • stop and remove all containers
  • re-create all containers

This causes downtime of N * (time to stop and start a container) - where N is the number of containers.

It would be nice if watchtower had an algorithm to:

  • For each contaienr:
    • stop and remove container
    • re-create container.

Is it possible? Is it a planned feature? Is it a known issue?

Rush avatar Apr 07 '19 02:04 Rush

I've noticed the same thing and if the above could be implemented, that would be awesome :-)

SmallFriendlyKiwi avatar Apr 07 '19 09:04 SmallFriendlyKiwi

Thanks for your issue! This is definitely something we should take a look at. If you feel up for it, feel free to submit a pull request and I'll have a look. 👍

simskij avatar Apr 07 '19 12:04 simskij

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 02 '19 07:06 stale[bot]

Why would a real issue be closed due to inactivity?

Rush avatar Jun 02 '19 08:06 Rush

Still trimming in stale-bot. Some false positives remain to be ironed out.

simskij avatar Jun 02 '19 09:06 simskij

And just to elaborate a bit on why we do this; I think the stale issues repo explains it all in a very good way:

In an ideal world with infinite resources, there would be no need for this app.

But in any successful software project, there's always more work to do than people to do it. As more and more work piles up, it becomes paralyzing. Just making decisions about what work should and shouldn't get done can exhaust all available resources. In the experience of the maintainers of this app—and the hundreds of other projects and organizations that use it—focusing on issues that are actively affecting humans is an effective method for prioritizing work.

To some, a robot trying to close stale issues may seem inhospitable or offensive to contributors. But the alternative is to disrespect them by setting false expectations and implicitly ignoring their work. This app makes it explicit: if work is not progressing, then it's stale. A comment is all it takes to keep the conversation alive.

With that said, your issue is added to a milestone as this might become an actual problem, and as such - wont be marked as stale.

Thanks for understanding. 🙏

simskij avatar Jun 02 '19 11:06 simskij

I stopped using watchtower because of this issue.

demyxco avatar Sep 24 '19 09:09 demyxco

I am looking for a way that instruct watchtower don't stop all my containers at the same time. This is really a problem! Lets say you have 3 instances behind a load balancer, watchtower will stop them all.

smallswan399 avatar Oct 21 '19 13:10 smallswan399

As a work-around, you might run multiple watchtower instances, one instance for each container you want to monitor.

donce avatar Dec 15 '19 13:12 donce

Is this still an issue? Thinking about implementing watchtower, but with this kind of behavior it won't be good for my scenario. I have more than one hundred containers using the same image in the same server. I really need something more close from what OP said.

matheuscmpm avatar Feb 20 '20 17:02 matheuscmpm

Yes, this is still how it works. However, I'd be more than open to changing this behavior, although it would require some help from the community as I, to be fair, lack time at this point.

simskij avatar Mar 21 '20 13:03 simskij

Greetings @simskij !

Is this issue open to be worked on? I'd love to have a go at it if available.

Thank you!

vrajashkr avatar Aug 15 '20 05:08 vrajashkr

For sure, go for it! 🙏🏼

simskij avatar Aug 15 '20 06:08 simskij

Thank you!

@simskij I ran into some trouble while trying out the application. Should I mention them here or on Gitter?

vrajashkr avatar Aug 15 '20 08:08 vrajashkr

Here is better if someone else wants to assist, but Gitter works just as well! 👌

simskij avatar Aug 15 '20 08:08 simskij

Awesome!

Here is the issue I ran into:

DEBU[0100] Got image name: altariax0x01/mybuntu:latest  
INFO[0100] Found new altariax0x01/mybuntu:latest image (sha256:77e1d6c5b9c0f022928f1732791ccd12fcb6029baf686b4cfcebafe7dbce6ec7) 
INFO[0100] Stopping /t1 (bbd9ce79fad7737c0fa0c9512d526d286ad38565004dcbfd123adfbed11ff0d6) with SIGTERM 
DEBU[0101] Removing container bbd9ce79fad7737c0fa0c9512d526d286ad38565004dcbfd123adfbed11ff0d6 
2020/08/15 15:46:46 cron: panic running job: runtime error: invalid memory address or nil pointer dereference
goroutine 13 [running]:
github.com/robfig/cron.(*Cron).runWithRecovery.func1(0xc0002c8500)
        /home/ubuntu/go/pkg/mod/github.com/robfig/[email protected]/cron.go:161 +0x9e
panic(0xae3ba0, 0x1021190)
        /home/ubuntu/go/src/runtime/panic.go:969 +0x175
github.com/containrrr/watchtower/pkg/container.Container.runtimeConfig(0x100, 0xc000485d40, 0x0, 0xc000392480)
        /home/ubuntu/watchtower/pkg/container/container.go:169 +0x4e
github.com/containrrr/watchtower/pkg/container.dockerClient.StartContainer(0xc89b40, 0xc00030c700, 0x1, 0x920100, 0xc000485d40, 0x0, 0x1, 0xc000020100, 0xc000485d40, 0x0)
        /home/ubuntu/watchtower/pkg/container/client.go:163 +0x86
github.com/containrrr/watchtower/internal/actions.restartStaleContainer(0x7faf5b8d0100, 0xc000485d40, 0x0, 0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0)
        /home/ubuntu/watchtower/internal/actions/update.go:121 +0xdd
github.com/containrrr/watchtower/internal/actions.restartContainersInSortedOrder(0xc0003e2420, 0x1, 0x1, 0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0)
        /home/ubuntu/watchtower/internal/actions/update.go:96 +0x255
github.com/containrrr/watchtower/internal/actions.Update(0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0, 0x1abab3a6, 0x2000000030001)
        /home/ubuntu/watchtower/internal/actions/update.go:53 +0x369
github.com/containrrr/watchtower/cmd.runUpdatesWithNotifications(0xc00002f960)
        /home/ubuntu/watchtower/cmd/root.go:211 +0xb3
github.com/containrrr/watchtower/cmd.runUpgradesOnSchedule.func1()
        /home/ubuntu/watchtower/cmd/root.go:168 +0xb6
github.com/robfig/cron.FuncJob.Run(0xc000448100)
        /home/ubuntu/go/pkg/mod/github.com/robfig/[email protected]/cron.go:92 +0x25
github.com/robfig/cron.(*Cron).runWithRecovery(0xc0002c8500, 0xc6dde0, 0xc000448100)
        /home/ubuntu/go/pkg/mod/github.com/robfig/[email protected]/cron.go:165 +0x59
created by github.com/robfig/cron.(*Cron).run
        /home/ubuntu/go/pkg/mod/github.com/robfig/[email protected]/cron.go:199 +0x76a
Steps to reproduce:
  1. Clone repo
  2. build watchtower
  3. create test container with test image
  4. start watchtower
  5. update test image
  6. push image to DockerHub
Expected:

The container is stopped and restarted with the new version of the base image.

What actually happened:

The container is stopped, but the program panics while trying to restart the container which fails.

Environment:

Ubuntu 20.04.1 LTS running on an AWS EC2 instance. Docker server version: 19.03.12 Golang version: go1.15 linux/amd64

Any advice?

Thank you!

vrajashkr avatar Aug 15 '20 15:08 vrajashkr

Yeah, this is because of this: https://github.com/containrrr/watchtower/pull/612

You can base it on that branch to get started, or I will get it merged to master tomorrow!

piksel avatar Aug 15 '20 22:08 piksel

@piksel Thank you for the information! I'll get started with that branch to test my changes. I can make a PR for the changes once that branch is merged into master.

vrajashkr avatar Aug 16 '20 05:08 vrajashkr

I know it's been a while. :) Likely there has been no progress but it doesn't hurt to ask.

Rush avatar Feb 03 '24 06:02 Rush