node
node copied to clipboard
use k8s cluster default imagePullPolicy
Reproducer
- build docker image "test:123" which would just "echo 123" and "docker push" it to the registry;
- create the deployment (send-manifest);
- get the logs from the provider (provider lease-logs);
- close the deployment
- repeat first 4 steps except that do some modification to the image in step 1, say "echo 555", but keep the same image tag "test:123".
Update: this is mostly about the image pull policy for the images with the :latest
and untagged images as seen from below discussion. Other tags, such as test:123
should stay test:123
1:1 (immutable).
Expected behavior:
Provider should pull the new image if it has :latest
tag or untagged.
Actual behavior:
The provider will not pull the new image, it will start the old image.
Workaround:
There is no workaround to a changed default K8s behavior.
Provider in question:
I haven't tested different providers.
"equinix-metal-ams1","akash","mn2-ng","https://provider.provider-0.prod.ams1.akash.pub:8443","akash14c4ng96vdle6tae8r4hc2w4ujwrsh3x9tuudk0"
cc @dmikey
@arno01, this is expected behavior. You need to use a different tag to deploy updates. Images are heavily cached, pushing changes to a single tag will cause both versions to be running very easily, even if we forced a pull every time.
@arno01, this is expected behavior. You need to use a different tag to deploy updates. Images are heavily cached, pushing changes to a single tag will cause both versions to be running very easily, even if we forced a pull every time.
@boz this should be mentioned in the Akash Docs then, since I've already seen two cases where people would not figure why their image is not working at one provider but does at the other one. 1 2
And then this breaks the default Kubernetes behavior for the :latest
image tag, since it should always be re-pulling images of this tag:
- omit the imagePullPolicy and use :latest as the tag for the image to use; Kubernetes will set the policy to Always.
Source: https://kubernetes.io/docs/concepts/containers/images/#updating-images
I would actually be more inclined to let people specify in their deployment manifests the imagePullPolicy
in order to get more flexible about this.
And then, most of people are going to be using same tags, e.g. nginx:stable
, nginx:mainline
, nginx:latest
, nginx:1.21
(can replace nginx
practically with any image).
The issue I see with that (the providers are heavily caching the images) is that the security issues in the libraries of the images (nor the apps themselves in some cases, i.e. nginx:stable
, nginx:1
, nginx:latest
) will not get pushed as people re-deploy their deployments should they get alerted by CVE reports.
it is best practice to use a version tag. we can add that to the docs.
It says as much right in your source
Caution: You should avoid using the latest tag when deploying containers in production, as it is harder to track which version of the image is running and more difficult to roll back to a working version.
Instead, specify a meaningful tag such as v1.42.0.
@boz
It says as much right in your source
Caution: You should avoid using the latest tag when deploying containers in production, as it is harder to track which version of the image is running and more difficult to roll back to a working version.
Instead, specify a meaningful tag such as v1.42.0.
I agree with that recommendation, however the concern I've raised in this issue has nothing to do with that recommendation.
This is a recommendation for running the production containers which many of people do not.
Many (probably even the majority) are going to be using no tag which defaults to :latest
. (People are usually modifying their images then pushing them again and over again, with :latest
tag.)
(Edit: And this is not only about the :latest
tag. See Update 1 below)
As I'm typing this, there is a 3rd occurrence I'm seeing people are struggling because of this and I'm having to explain them the Akash providers are heavily caching the images, they should use a new tag for every new image update.
My point is not to lobby for using the :latest
tag but rather to fix the imagePullPolicy
which is not currently set to Always
for the untagged images (or tagged with :latest
tag) on the Akash providers.
I suggest to stick with the default imagePullPolicy
otherwise there are more and more people will be hitting this issue, there is no doubt about it.
Update 1:
And then, the same applies to the other than :latest
image tags.
There are plenty of projects leverage CI/CD for constantly building and re-pushing the same tags
, when that happens, the libraries are getting updated (i.e. openssl
library in the nginx
container).
In this case, the image tag stays always same, just as I've already described above https://github.com/ovrclk/akash/issues/1354#issuecomment-907866119
Which leads to security issues when Akash provider caches these images and is not re-pulling them.
Basically this deviation from the defaults is not just a nuisance but rather a security issue in not having to re-pull the images. (simplest example: nginx:stable
won't get re-pulled, therefore, users won't get security updates on the redeployment nor other people who make their first nginx:stable
deployment will likely to receive a stale cached copy of nginx:stable
[which may turn out to not be as :stable
as it sounds ;)])
100% agree with @arno01
I wish :latest
worked out of the box
My recommendation back when mainnet launched was to block :latest
, :stable
etc. The labels don't actually mean anything unfortunately. For example, there is no guaranteed that :latest
is in fact the latest.
For example, there is no guaranteed that
:latest
is in fact the latest.
It should guarantee that for the moment someone deploys the app, which isn't working that way as of now because of imagePullPolicy: IfNotPresent
.
It won't mean it's the :latest
some prolonged time from the moment someone deployed the app.
This is how K8s works and have always been working by default everywhere for the :latest
tag, I've mentioned that above
I'd prefer sticking with the widely accepted defaults (for imagePullPolicy
) rather than inventing something new.
And should we want :latest
to actually mean they will always be :latest
, we should probably look into implementing something like https://keel.sh which would automatically take care updating the images whenever there is a new behind the :latest
tag. But there should be a toggle for that, i.e. services.<name>.image_auto_update: true
for example.
As a compromise, the imagePullPolicy
could be set via the SDL i.e. services.<name>.imagePullPolicy: always
for example.
To get the default image pull policy, we should only remove this line https://github.com/ovrclk/akash/blob/v0.16.4/provider/cluster/kube/builder/workload.go#L56
From the doc
When you first create a Deployment, StatefulSet, Pod, or other object that includes a Pod template, then by default the pull policy of all containers in that pod will be set to IfNotPresent if it is not explicitly specified. This policy causes the kubelet to skip pulling an image if it already exists.
So K8s sets PullIfNotPresent
by default, just like we do, except that when we do it, we ignore these special conditions:
Default image pull policy When you (or a controller) submit a new Pod to the API server, your cluster sets the imagePullPolicy field when specific conditions are met:
- if you omit the imagePullPolicy field, and the tag for the container image is :latest, imagePullPolicy is automatically set to Always;
- if you omit the imagePullPolicy field, and you don't specify the tag for the container image, imagePullPolicy is automatically set to Always;
- if you omit the imagePullPolicy field, and you specify the tag for the container image that isn't :latest, the imagePullPolicy is automatically set to IfNotPresent.
https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting
As @tidrolpolelsef said, it's one of those weird config options where 'unset' has its own unique meaning :-)
PR https://github.com/ovrclk/provider-services/pull/54
FWIW, I've tested this further, apparently the HEAD
requests are used for obtaining the image reference and they aren’t triggering the Docker Hub rate limiter upon Pod restart even with the imagePullPolicy: Always
, as long as image reference haven't been updated on the remote.
https://asciinema.org/a/541059
The Docker Hub API isn't too restrictive (100 pulls per 6 hours per IP address
). Source
Wanted to update this thread after my discussion with @boz today: Pros of supporting/ allowing users to use the "latest" tag in SDLs (and pulling a fresh uncached image when used or no tag specified):
- Allows autoupdating of container images in cases like presearch
- Allows us to potentially build a CI/CD style pipeline where new artifacts get deployed automatically to akash
- Allows easier integration with solutions like Fleek.co
- Is consistent with Kubernetes default behavior
For these reasons we are going to work to allow this. The only concern/ downside is making sure we document this behavior clearly and note that a running pod will not be updated unless the pod is terminated (forcing a new pod to be spun up, which will result in the "latest" latest image to be pulled down) - which I believe is how solutions like the presearch "autopdater" handle this anyway (their autoupdater is a fork of https://github.com/containrrr/watchtower)
forcing a new pod to be spun up, which will result in the "latest" latest image to be pulled down
And this (the pod restart) can be easily triggerred from within the pod itself or the outside (through lease-shell, then kill <child process of PID 1>
inside the pod).
Either way, this can also be scheduled, etc