nomad-driver-podman icon indicating copy to clipboard operation
nomad-driver-podman copied to clipboard

Support auto-updates of containers

Open MrDrMcCoy opened this issue 9 months ago • 8 comments

Upstream issue here: https://github.com/hashicorp/nomad/issues/18440

Problem

Running a container whose registry replaces and updates tags like latest or a major/minor version number leaves the container that Nomad schedules for Podman stuck at whatever version it was initially pulled with. This is cumbersome and hides the fact that containers are out of date.

Attempted solutions

  • Changing the jobspec to initiate a restart does not trigger a pull for image updates.
  • Manually requesting a reschedule and restart also does not trigger a pull.
  • Adding the "io.containers.autoupdate" = "registry" label to the task does not enable Podman's auto-update feature.
  • Enabling force_pull does allow the container to update, but this significantly increases container start time which can be problematic.

Desired solution

Driver config

  • image_auto_pull (bool) defaults to false: Sets the default behavior for checking registries and pulling image updates for defined tags.
  • ~~image_auto_pull_eager (bool) defaults to false: Sets the default behavior for this node to pull all defined images in Nomad, even if an image is not scheduled to run on it. This allows containers to start quickly when being scheduled on a node that has not yet run it.~~ not possible
  • image_auto_prune (bool) defaults to false: Sets the default behavior for this node to prune image layers that belong to image tags which are not defined by any tasks or running containers. This occurs at the end of image_auto_pull_interval for any successfully pulled images.
  • image_auto_pull_interval (int) defaults to 86400 (one day in seconds): Sets the default interval for checking registries for updates to existing image tags.

Task config

  • image_auto_pull (bool) defaults to false: Sets the task behavior for checking registries and pulling image updates for defined tags.
  • ~~image_auto_pull_eager (bool) defaults to false: Sets the task behavior for all nodes to pull the defined image, even if the image is not scheduled to run on a node. This allows containers to start quickly when being scheduled on a node that has not yet run it.~~ not possible.
  • image_auto_pull_interval (int) defaults to 86400 (one day in seconds): Sets the task interval for checking registries for updates to existing image tags.
  • container_auto_update (bool) defaults to false: Initiates a task restart when new images have been pulled in accordance with defined update config in Jobspec.

Globally (should be documented somewhere)

  • If a task is restarted without force_pull, it will start with the most recent image locally available without checking the upstream registry.
  • Containers that set image_auto_pull but not container_auto_update will get the new image version when restarted. Rescheduling or scaling a task will maintain the previous image tag.
  • ~~The Podman driver keeps track of hashes on each node to ensure the same version comes up for each task. This avoids the scenario where a single node deploys a tag and another node that never pulled it gets a newer version of the same tag when scaling up or rescheduling, potentially running an application with mismatched versions.~~ not possible.

Other notes

image_auto_pull_eager and image_auto_prune are admittedly a bit of scope creep, but are related QoL features that might be worth adding at the same time.

MrDrMcCoy avatar Jul 08 '25 03:07 MrDrMcCoy

Hi @MrDrMcCoy thank you for this well thought out suggestion, it seems like a reasonable ask but we will need to talk about it with the team. I will make sure to put this one up for discussion and road mapping.

Juanadelacuesta avatar Jul 08 '25 12:07 Juanadelacuesta

The Podman driver keeps track of hashes on each node to ensure the same version comes up for each task. This avoids the scenario where a single node deploys a tag and another node that never pulled it gets a newer version of the same tag when scaling up or rescheduling, potentially running an application with mismatched versions.

This particular problem cannot be done within the driver because the instances of a driver can't coordinate with each other. Any coordination has to be done in the Nomad control plane. If follow where this idea leads a bit, we'll also see that folks are going to want updates of the image to respect the update block so that you're not restarting all your tasks at the same time. And that also needs to be coordinated in the Nomad control plane.

This has been discussed a good bit in https://github.com/hashicorp/nomad/issues/18440 https://github.com/hashicorp/nomad/issues/13061 As you can see, this feature effectively boils down to "periodically redeploy my job".

tgross avatar Jul 08 '25 12:07 tgross

This particular problem cannot be done within the driver because the instances of a driver can't coordinate with each other. Any coordination has to be done in the Nomad control plane.

Bummer that the drivers can't see each other. Does Nomad itself keep track of which image hash is tied to a task, or otherwise allow the driver to submit metadata to Nomad in a way that can be read by the other instances? It's not critical, since the ability to auto-upgrade containers allows for eventual consistency, but would be very nice to have for applications that need to be scaled identically.

Of course the problem of keeping image hashes identical is one that likely predates this driver entirely. If a node is told to grab latest on one machine, a new latest is pushed, and Nomad scales up on a new node that's never seen the tag before, it will result in mismatched versions. I can replicate this in my environment. If this behavior exists with other drivers, and I expect it does, the Podman team can be the first ones to fix it ;-)

As you can see, this feature effectively boils down to "periodically redeploy my job".

Indeed, and this is not unique to Nomad. All containers need to be replaced on update, so this is expected. The suggestion is to do this and/or stage the updated tag for the next Jobspec update or restart.

MrDrMcCoy avatar Jul 09 '25 00:07 MrDrMcCoy

Does Nomad itself keep track of which image hash is tied to a task, or otherwise allow the driver to submit metadata to Nomad in a way that can be read by the other instances?

No, the control plane is generally pretty ignorant about what's going on in the task driver. Even the task.config block is currently treated as an opaque blob and can't be validated before it gets to the client. The root design issue here is that the task driver is loosely coupled from the Nomad control plane or even the Nomad client. Which is a strength in terms of flexibility (task drivers can do anything; one of my colleagues once wrote a silly hack where each "task" was a MIDI track), but a weakness in terms of coordinating between them. In theory a task driver could publish Task Events but there's no way for another instance of the driver to get that back out today.

Of course the problem of keeping image hashes identical is one that likely predates this driver entirely. If a node is told to grab latest on one machine, a new latest is pushed, and Nomad scales up on a new node that's never seen the tag before, it will result in mismatched versions

Yup, as intended! You really don't want to be using latest if you want control over deployments. You should be pinning to a specific tag/hash and then deploying that. The way I've done this before I joined the team and was running Nomad in my production environment was to have my CI system fill in the image hash prior to submitting the job. With HCL2 variables that's even easier now.

tgross avatar Jul 09 '25 13:07 tgross

You really don't want to be using latest if you want control over deployments. You should be pinning to a specific tag/hash and then deploying that.

As true as this is, many 3rd party projects that stacks rely on don't tag their containers properly, and it is a large maintenance burden to set up your own pipeline to freeze or rebuild their containers. Requiring correctness is not sustainable.

MrDrMcCoy avatar Jul 09 '25 17:07 MrDrMcCoy

As true as this is, many 3rd party projects that stacks rely on don't tag their containers properly, and it is a large maintenance burden to set up your own pipeline to freeze or rebuild their containers. Requiring correctness is not sustainable.

Sure, fair enough, the world is imperfect.

As far as this particular GitHub issue goes, it seems safe to close this out? This will ultimately have to be work done in Nomad core and not the plugin.

tgross avatar Jul 09 '25 18:07 tgross

This will ultimately have to be work done in Nomad core and not the plugin.

Couldn't the driver have an option to blindly pull and replace containers on a timer? It would be better than nothing and still offer eventual consistency for certain use cases.

MrDrMcCoy avatar Jul 09 '25 18:07 MrDrMcCoy

Couldn't the driver have an option to blindly pull and replace containers on a timer? It would be better than nothing and still offer eventual consistency for certain use cases.

It's probably technically possible to do. It doesn't look like the podman driver has the complication of having to create its own network namespaces that the Docker driver does.

From a design standpoint I think what will happen is users will turn it on and then get a nasty surprise when they take an outage because of the lack of coordination, so I'm not enthusiastic about the idea. But we can keep this open and if someone from the community wants to give it a go we can discuss it.

tgross avatar Jul 09 '25 19:07 tgross