watchtower Adding parameter to allow containers to be updated only after some number of days have passed since the new image has been published

This PR adds a new optional flag --delay-days that tells watchtower to wait to update a container to a new image until some number of days have passed since the image was published. This is useful for scenarios where images are published with a major defect that is only noticed after it has been installed in some portion of the user base. Instead of automatically updating the same day as such images are published, watchtower can now wait a day, a week, or however long the user wishes to be more certain that the image has been installed on many other people's machines and hasn't had a serious issue that caused the devs to publish a new defect-fixing image. Therefore, watchtower can still provide the same benefit of keeping the user's containers up-to-date, but with reduced risk.

closes #1879

Testing with

watchtower --log-level info --interval 30 --delay-days 1

I started with 4 containers running locally, all of which were on latest. The usual watchtower output is seen, with a note at startup indicating that delayed updates are active and for how many days:

2023-12-17 20:39:54 time="2023-12-18T02:39:54Z" level=info msg="Watchtower " 2023-12-17 20:39:54 time="2023-12-18T02:39:54Z" level=info msg="Using no notifications" 2023-12-17 20:39:54 time="2023-12-18T02:39:54Z" level=info msg="Checking all containers (except explicitly disabled with label)" 2023-12-17 20:39:54 time="2023-12-18T02:39:54Z" level=info msg="Scheduling first run: 2023-12-18 02:40:24 +0000 UTC" 2023-12-17 20:39:54 time="2023-12-18T02:39:54Z" level=info msg="Note that the first check will be performed in 29 seconds" 2023-12-17 20:39:54 time="2023-12-18T02:39:54Z" level=info msg="Container updates will be delayed until 1 day(s) after image creation." 2023-12-17 20:40:25 time="2023-12-18T02:40:25Z" level=info msg="Session done" Failed=0 Scanned=4 Updated=0 notify=no 2023-12-17 20:40:54 time="2023-12-18T02:40:54Z" level=info msg="Session done" Failed=0 Scanned=4 Updated=0 notify=no

Then I modified a line in one of the apps, then issued a new docker build and push for the new version, making it the latest. Now, at each interval, watchtower sees that the new image exists but it hasn't yet been a full day since it was published so it doesn't perform the update but does indicate this reason in the log:

2023-12-17 20:41:54 time="2023-12-18T02:41:54Z" level=info msg="New image found for /my-local-app that was created 0 day(s) ago but update delayed until 1 day(s) after creation" 2023-12-17 20:41:54 time="2023-12-18T02:41:54Z" level=info msg="Session done" Failed=0 Scanned=4 Updated=0 notify=no

After that version gets to be 1 full day old, it will update as normal. I didn't retain my logs for that happening last night in my testing, but it did work. I'll drop a comment on this PR once my latest run gets to that point again so you can see the logs for that.

If running watchtower without the --delay-days flag set, or with a value of 0, it operates the normal way. I can supply a log of that too if you like.

As this is my first contribution to this repo, please just let me know if you need any other info or changes.

Dec 18 '23 02:12 pjdubya

Thanks for the input and clarification on direction. I have made the requested changes and in my testing so far it appears to be working, though I need to wait until about 5pm ET tomorrow for my latest image to meet my threshold to be sure. I'll comment on the results of that test tomorrow. Since it took me a bit of time to understand some of the existing code, I also added a few comments that I hope will be useful to others.

Any other changes needed, please just let me know.

Dec 22 '23 00:12 pjdubya

The testing results indicate that it's working as expected, e.g.

$ ./watchtower.exe --host tcp://localhost:2375 --interval 3600 --delay-days 1 time="2023-12-21T19:41:59-05:00" level=info msg="Watchtower v0.0.0-unknown" time="2023-12-21T19:41:59-05:00" level=info msg="Using no notifications" time="2023-12-21T19:41:59-05:00" level=info msg="Checking all containers (except explicitly disabled with label)" time="2023-12-21T19:41:59-05:00" level=info msg="Scheduling first run: 2023-12-21 20:41:59 -0500 EST" time="2023-12-21T19:41:59-05:00" level=info msg="Note that the first check will be performed in 59 minutes, 59 seconds" time="2023-12-21T19:41:59-05:00" level=info msg="Container updates will be delayed until 1 day(s) after image creation." time="2023-12-22T00:13:36-05:00" level=info msg="Found new localhost:5000/my-local-app:latest image (0cc5cc140adb)" time="2023-12-22T00:13:36-05:00" level=info msg="New image found for /intelligent_varahamihira that was created 0 day(s) ago but update delayed until 1 day(s) after creation" time="2023-12-22T00:13:37-05:00" level=info msg="Session done" Failed=0 Scanned=3 Updated=1 notify=no time="2023-12-22T01:13:36-05:00" level=info msg="Found new localhost:5000/my-local-app:latest image (0cc5cc140adb)" time="2023-12-22T01:13:36-05:00" level=info msg="New image found for /intelligent_varahamihira that was created 0 day(s) ago but update delayed until 1 day(s) after creation" time="2023-12-22T01:13:37-05:00" level=info msg="Session done" Failed=0 Scanned=3 Updated=1 notify=no

...

time="2023-12-22T17:32:18-05:00" level=info msg="Found new localhost:5000/my-local-app:latest image (0cc5cc140adb)" time="2023-12-22T17:32:19-05:00" level=info msg="Stopping /intelligent_varahamihira (e21283352217) with SIGTERM" time="2023-12-22T17:32:29-05:00" level=info msg="Creating /intelligent_varahamihira" time="2023-12-22T17:32:30-05:00" level=info msg="Session done" Failed=0 Scanned=3 Updated=1 notify=no time="2023-12-22T18:32:19-05:00" level=info msg="Session done" Failed=0 Scanned=3 Updated=0 notify=no

Dec 23 '23 17:12 pjdubya

I still think marking it as not stale may have consequences for other interactions, like notifications and metrics. I'll take a look when I am at a computer.

Dec 24 '23 09:12 piksel

I still think marking it as not stale may have consequences for other interactions, like notifications and metrics. I'll take a look when I am at a computer.

The code was hard for me to follow as to what the definition of stale is and the implication for calling a container that or not. And it's not clear to me why all containers are added to containersToUpdate rather than just the ones that are to be updated, as the variable name implies. It works as it is because stopContainersInReversedOrder and restartContainersInSortedOrder each return without doing anything if ToRestart() is false, which will be false if stale is false, but I would have thought it better to avoid adding items to containersToUpdate unless they're supposed to be updated, i.e. they're stale (and beyond the delay-days waiting period in this case). So rather than cleaning that logic up since it seemed like it must be the way it is for a reason, I just piggybacked on it and expanded the idea of "stale" to really be "outdated and ready to be updated". I'm happy to take a deeper cut at reorganizing that bit though if you think that's better. Just let me know.

I also just realized I forgot to back out some changes to client.go from before so I'm about to push an update for that too.

Dec 26 '23 23:12 pjdubya

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (0a14f3a) 70.55% compared to head (ac8506a) 69.96%. Report is 3 commits behind head on main.

:exclamation: Current head ac8506a differs from pull request most recent head 6925130. Consider uploading reports for the commit 6925130 to get more accurate results

Files	Patch %	Lines
internal/actions/update.go	20.58%	25 Missing and 2 partials :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1884      +/-   ##
==========================================
- Coverage   70.55%   69.96%   -0.60%     
==========================================
  Files          26       26              
  Lines        2493     2530      +37     
==========================================
+ Hits         1759     1770      +11     
- Misses        633      658      +25     
- Partials      101      102       +1

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jan 01 '24 15:01 codecov[bot]

I started making some changes to your branch but it ended up mostly being a cleanup of the current update code, so I made it a separate PR (#1895). It should make it much easier to integrate your changes, and to be able to understand the code in future. When it's merged, I'll rebase this branch ontop of it and make a suggestion for how to integrate it. Sorry for the delay!

Jan 06 '24 13:01 piksel

I started making some changes to your branch but it ended up mostly being a cleanup of the current update code, so I made it a separate PR (#1895). It should make it much easier to integrate your changes, and to be able to understand the code in future. When it's merged, I'll rebase this branch ontop of it and make a suggestion for how to integrate it. Sorry for the delay!

Thanks for the updates. I just realized yesterday that there was a codecov action pending on me so I spent time trying to figure out how to add test cases for my changes. I'm nearly done with that so I can just plan to push an update to that code when I'm done, then when you're done with the other PR it should be pretty easy to wrap all of this up.

If you don't mind, I'll add some comments on your other PR for areas that I still find confusing about the latest code that might be worth adding a comment at least to help explain the logic as long as you're doing a bit of a cleanup. Fresh eyes and all that. For example, at the moment in my test case I'm trying to use the container's "state" to tell me whether it was ultimately updated to a new image like this:

		Expect(report.Updated()).To(HaveLen(1))

But it doesn't work because all containers were marked for update with this:

var containersToUpdate []types.Container
for _, c := range containers {
	if !c.IsMonitorOnly(params) {
		containersToUpdate = append(containersToUpdate, c)
		progress.MarkForUpdate(c.ID())
	}
}

So I will look for another way to do this test but I think it would be helpful to define each state's meaning since I don't think I'm understanding why something would be marked as updated before it's updated, or in the case of non-stale containers, is even considered for update. I was expecting that only the containers that required some update (due to stale, or any other reason) would be added to containersToUpdate, and only those containers would be run through Update(). MarkForUpdate() seems to be assuming everything will be ultimately be updated, and pre-emptively marking its state as though it completed that update, but another way to do that might be to update the state only at the end of a successful Update().

Anyway thanks again for the update. I'll try to add more specific thoughts on your PR in case you want to consider any of this. If not, that's fine too!

Jan 06 '24 18:01 pjdubya

@piksel I added a few comments on your PR. Overall I find it easier to understand the logic with your changes, especially the SetMarkedForUpdate(). Once that merges in I can update my branch to match those changes pretty easily.

Jan 07 '24 20:01 pjdubya

Added test cases and improvements to report on the number of deferred containers; renamed the parameter to defer-days to be cleaner as to purpose and align with status in report. I'll pause now on this until you're done with the other PR so we can integrate the two then. Thanks!

Jan 08 '24 00:01 pjdubya

Hi @pjdubya, any updates on this?

Jun 19 '24 17:06 MattBlackOnly