nomad Prefetch image before killing the running allocation on task update

Prefetch image before killing the running allocation on task update

Open mclate opened this issue 4 years ago • 8 comments

Nomad version

Any (in particular, 0.8.7)

Issue

When task is being updated (especially, when underlying docker image is being updated), nomad kills currently running allocation (according to update policy) and only then starts new allocation. Only then new image is being pulled. This can result in a service interruption in cases when service must be running as a singleton or when update is configured incorrectly (i.e., image failed to pull before update interval): old allocation is already killed, but new one is still pulling the image.

Would it make sense to have additional param in update stanza that would make nomad prefetch new image and only then start the rollover? Or, if we take it one step further, would it make sense to make nomad kill existing allocation only when new one is reporting "healthy" service checks?

Sep 25 '19 16:09 mclate

This would certainly help when trying to update system jobs, especially when the image is rather large.

Apr 23 '20 11:04 sparkacus

Canary deployment could help with it: canary starting before killing present allocation.

Apr 23 '20 15:04 pznamensky

I don't think system tasks support Canary, although according to the documentation I believe that might be supported at a later date.

Some of our system tasks also have static port mapping.

Apr 23 '20 17:04 sparkacus

Enterprise support customer here:

This seems like something that would be quite useful for us. We currently deploy these - unfortunately - very massive Windows images that take quite a while to download. This causes fun issues where sometimes the registry times out - Windows image support isn't super great on certain registries, especially for multi-gigabyte images - or the container may fail to start for whatever reason, causing outages if that container cannot run more than 1 allocation at a time for whatever reason.

Ideally, we pull the image prior to a deploy, so that the only thing left is to start the container once the previous one is stopped. This decreases the downtime between deploys, and also ensures that we don't hit pull issues when the cache isn't fresh for some of those larger images.

Feb 25 '21 09:02 josegonzalez

This issue hits us as well.

In our setup, there is a slow laggy network that we can't control, so the Docker image can pull 2-3 hours and also docker pull could hang or fail with 'EOF' living us in the middle of the 2-3h deployment with a DEAD replica. As the result, we always get HOURS of downtime of each replica on each deploy because Nomad kills the alloc first and only then starts the slow docker pull.

Also, Nomad cleans up all unused Docker images so quickly that when you redeploy the alloc that previously failed or gave up pulling it starts pulling the image again from ground zero because Nomad already cleaned up all that image layers ) this creates an endless loop

This is so strange after the k8s experience because k8s always pull the required images before touching any running container thus reducing the downtime of each container replica. Especially strange when Nomad kills some of your critical allocations, then tries to pull the image.. and pull hangs or fails due to e.g. a bad network or typo in the image tag.

Even if you don't run the app as a singleton and you have faster network or smaller images - why should some replica of your app still be offline for minutes instead of seconds during each deploy while Nomad pulls the image?

Jan 06 '22 16:01 daniilyar-incountry

Sorry for silence from HashiCorp on this! To be clear we think it's a great idea.

For the most straightforward case where an image is reused when a task is being updated on a node: image_delay should be sufficient to avoid re-pulling. If it's not working as intended please file a new issue with repro steps and/or logs!

Sadly Nomad still does not have a solution for the more general leaving existing workloads in place until their replacements are ready to run (in particular: have their images pulled). This would require new points of coordination for Nomad:

NextAllocWatcher to block shutdown signals until the replacement alloc has been setup. (The inverse of PrevAllocWatcher.)
A DriverPlugin.Prestart hook for drivers to perform prestart tasks like image pulling, NextAllocWatcher on the old alloc would block shutdown until Prestart on the new alloc completes. (The new alloc may even proceed to block on PrevAllocWatcher for the old alloc to shutdown! That should Just Work as implemented today.)
Alternatively add a new artifact management capability to give operators control over when images and other artifacts are pulled and gc'd.

Hack: sysbatch image pulls

Nomad v1.2.0 implemented system batch jobs which like system jobs run on all nodes by default, but like batch jobs are expected to exit and not be restarted if they exited without an error.

You can use a sysbatch job that requires the same image as your service, but specifies a different entrypoint to exit immediately instead of running your target service. Once the sysbatch job completes, you can deploy your service and the image will already be pulled!

The major caveat here is that the image will only be cached for 3 minutes after the sysbatch job exits by default. This can be tuned by configuring image_delay on a per-node basis, but it still requires you to configure your nodes with knowledge of your workload's image pulling. Very awkward and error prone.

I want a real solution for this, but hopefully this workaround helps some in the mean time.

May 09 '22 18:05 schmichael

Any updates on this?

Oct 28 '22 07:10 valafon

Hi @valafon, we do not have any further update unfortunately. Once we do, a member of the Nomad team will provide an update via this issue.

Oct 28 '22 07:10 jrasell

nomad nomad copied to clipboard

Prefetch image before killing the running allocation on task update

Nomad version

Issue

Hack: sysbatch image pulls

nomad
nomad copied to clipboard