docker-lock
docker-lock copied to clipboard
Idea: docker-lock migrate
I've been thinking about building something similar to this that includes the ability to copy images to an alternative registry, rather than just resolving tags to digests.
Abstractly, I'd love to have some way to map various functions over collections that contain image references.
You've already implemented support for lots of collections (Dockerfiles, docker-compose files, and kubernetes manifests) and two functions (rewite and verify). We could add a migrate function (we can bikeshed the name) that calls crane.Copy, too.
Some applications:
- Air-gapped/locked-down clusters that can only pull from a specific registry.
- Copying all your dependencies to a closer registry (for availability, latency, and rate limiting reasons).
I put together a proof of concept a while back that only worked for kubernetes manifests: https://github.com/google/ko/issues/11
What do you think?
cc @imjasonh
I think this is an excellent idea - it would make migrating to a private registry much easier.
Let me know if this somewhat matches what you had in mind in terms of workflow:
- The user has a project that uses base images from public registry(s) like Dockerhub.
- The user runs
docker lock generateto create a Lockfile. The Lockfile records all the relevant information about the public base images. - The user runs
docker lock migrate <flags about private registry, such as auth info>. It reads the Lockfile and usescrane.Copyto copy all the base images into the private registry. - The Lockfile is updated to have both information about the original public registry and the new private registry
- The user runs
docker lock rewriteand it rewrites all the base images to reference those in the private registry.
Now, let’s assume that an update has been pushed to the public registry(s):
- The user runs
docker lock generate, which updates info about the images from the public registry, but leaves the private registry untouched (technically it generates a new Lockfile, but the effect is the same). - The user runs
docker lock migrate, which reads the Lockfile and copies the updated images to the private registry - … The same workflow from the step above …
What do you think?
flags about private registry, such as auth info
I would expect this not to require flags and just use the docker auth config file, in the same way that crane.Digest works.
The Lockfile is updated to have both information about the original public registry and the new private registry
Seems reasonable. I haven't looked at the lockfile format, so I'm not sure how this would affect existing datastructures.
How would you handle migrating to multiple private registries? Say I want to have geo-redundant k8s clusters that each pull from their nearest registry. For GCR, you could rewrite manifests three times to point to eu.gcr.io, asia.gcr.io, or us.gcr.io, deploying each to their respective continent. Would you want to track that as three separate lockfiles? Or a single lockfile with multiple downstreams, somehow?
The user runs docker lock rewrite and it rewrites all the base images to reference those in the private registry.
With some flag or something to select which registry should be used in the rewrite? Or would that decision happen during docker lock migrate?
Now, let’s assume that an update...
This seems like a reasonable workflow -- one nice thing about tracking both upstream and downstream in lockfiles is that you could do some diffing client-side to skip migrating anything that's already downstream. That might complicate things, though.
I would expect this not to require flags and just use the docker auth config file, in the same way that
crane.Digestworks.
My apologies, that was a lapse on my part - yes, currently docker-lock just uses the auth info from the docker config file. No flags would be needed (in older versions, before I migrated to crane, this had to be configured manually).
How would you handle migrating to multiple private registries?
I am not super familiar with the geo-redundant case, but shouldn't this be handled by the registry itself? Quickly reading up on Azure (I am more familiar with it), for Azure Container Registry it appears as though you can just use one URL and it will pull from the ideal replica for you.
That said, I understand the case for having multiple replicas with multiple URLs. An alternative to having multiple Lockfiles as you suggested would be to have docker lock migrate accept flags, such as docker lock migrate <-downstream-registries=us.gcr.io,asia.gcr.io>. This could produce a Lockfile that has a key-value pair that looks like
downstream: [us.gcr.io, asia.gcr.io]
Then, when running docker lock rewrite, it could produce multiple Dockerfiles such as Dockerfile.us.gcr.io or Dockerfile.asia.gcr.io.
Just a thought, not wed to any solution, but I think it might be more ergonomic to always just have one Lockfile.
you could do some diffing client-side to skip migrating anything that's already downstream
As for client side diffing, I assume the goal is to have every replica registry contain the same images, so if an image already exists, docker-lock could just skip the crane.Copy step. (Trying to reduce the amount of work someone needs to do to use docker-lock - which is why all the current flags are implemented, instead of having people pair commands such as find with docker-lock).
I haven't looked at the lockfile format
Here is the Lockfile used in this project. It only uses Dockerhub, but following the README, you can generate them for sample projects using your own private registry.
I am not super familiar with the geo-redundant case, but shouldn't this be handled by the registry itself? Quickly reading up on Azure (I am more familiar with it), for Azure Container Registry it appears as though you can just use one URL and it will pull from the ideal replica for you.
Indeed, this is how a lot of registries work, but it introduces a single point of failure at the DNS or load balancer level. Sometimes it's nice to have complete isolation between two environments, which generally means you'll need multiple image references for the "same" workload. (There are other ways to accomplish similar things, I'm just brainstorming here.)
Just a thought, not wed to any solution, but I think it might be more ergonomic to always just have one Lockfile.
That seems reasonable to me. For the use case I really have in mind, this is sufficient. What I want is this:
- I have a bunch of kubernetes yaml.
- I want to run it on my cluster.
- My cluster can only pull from private.example.com.
docker-lockhelps me copy everything I need from that yaml into private.example.com and rewrites the kubernetes manifest images to point to private.example.com instead of wherever they came from originally.
I don't know that I really need the lockfile, but it seems integral to how docker-lock functions currently, and I don't think it really hurts anything to have it. I would defer to you for the best UX here.
One thing I haven't solved is how to rename images across registries. Ideally, you could mirror the structure of the source:
docker.io/library/foo -> private.example.com/library/foo
But, what if we also have gcr.io/library/foo in an image? There would be a collision.
One nice thing is that the collisions don't really matter if you're pulling by digest, but it's something to consider (especially if we're copying tags over).
ko has worked around this in a bunch of terrible ways with different naming strategy flags -- maybe this could be solved using go templates or something to let users specify how things should be renamed.
I assume the goal is to have every replica registry contain the same images, so if an image already exists, docker-lock could just skip the crane.Copy step.
Yep, exactly.
I don't know that I really need the lockfile, but it seems integral to how docker-lock functions currently, and I don't think it really hurts anything to have it. I would defer to you for the best UX here.
This was raised in the other open issue, and I tend to agree that in many cases you don't need the Lockfile. When I developed this (for my own usecase) I thought it would be nice to keep the hash information out of the Dockerfiles so that they would remain as readable as possible. As the project evolved, I am still 50/50 on whether this is necessary, but currently it is how it works.
One thing I haven't solved is how to rename images across registries.
This seems pretty hairy and makes me wonder if it might just be worth supporting a smaller subset of usecases.
In terms of time for this feature, I am not sure the next time I will have to add features, but am willing to review any PRs.
In terms of UX, I would generally see:
(1) generate a Lockfile (docker lock generate)
(2) read lockfile, push to new registries via crane.Copy (docker lock migrate)
Upon second thought, I will play around with it in the next week and ping you with an update / code, but feel free to try some ideas out as well if you have time.
In terms of time for this feature, I am not sure the next time I will have to add features, but am willing to review any PRs.
Upon second thought, I will play around with it in the next week and ping you with an update / code, but feel free to try some ideas out as well if you have time.
Sounds good -- no rush on my side as I am also a bit busy, but I might point some people towards this as a potential solution if they have time to implement it.
@jonjohnsonjr
I have a mvp working for the copy behavior in the branch miperel/migrate. It works with Dockerhub, but when testing it with Azure Container Registry, I ran into issues (opened an issue in crane) that also occur with the crane cli tool.
One thing I haven't solved is how to rename images across registries. Ideally, you could mirror the structure of the source: docker.io/library/foo -> private.example.com/library/foo But, what if we also have gcr.io/library/foo in an image? There would be a collision.
One other problem with this is that the same structure may not even work. For instance:
docker tag docker.io/library/redis myaccount/library/redis
docker push myaccount/library/redis
fails
but
docker tag docker.io/library/redis myaccount/redis
docker push myaccount/redis
succeeds.
In light of that, I was thinking that the simplest solution would be to just use the last part of the path, as in the example above.
However, this would be annoying for the case of a project that uses 2 images with the same last path:
bitnami/redis -> myaccount/redis
docker.io/library/redis -> myaccount/redis
I think that this case is somewhat rare though, and the command could throw a warning/error/rename a path in this case, so it could be an acceptable solution. Thoughts?
I think that this case is somewhat rare though, and the command could throw a warning/error/rename a path in this case, so it could be an acceptable solution. Thoughts?
Many! So this has bit me in ~four different contexts now, and it feels like I should write something up, but I honestly haven't found a great solution to it. Let me try to enumerate some constraints to explain why I think this is difficult, and one potential path forward:
- Some registries allow 1 or more path components.
- Some registries allow 2 or more path components.
- Some registries allow 3 or more path components.
- Some registries allow exactly two path components.
If your destination is one of the first three cases, this isn't actually too bad. You can just take a configured "root" in the destination registry and append all the path components from the source registry, e.g.:
DESTINATION=dst.example.com/lock
| in | out |
|---|---|
| src.example.com/foo:bar | dst.example.com/lock/foo:bar |
| src.example.com/foo/bar:baz | dst.example.com/lock/foo/bar:baz |
| src.example.com/foo/bar/baz:quux | dst.example.com/lock/foo/bar/baz:quux |
As long as we're unrestricted in the maximum number of paths, you can always choose a DESTINATION that works for your target registry.
The problem is with e.g. the fourth case. If we have an upper bound on path components in a source registry, how do we flatten them into a finite number of paths for the destination registry?
With ko, we ended up adding a bunch of flags to address this (I think flags were a mistake, but whatever):
--preserve-import-pathjust keeps the entire structure, similarly to the table above.--base-import-pathsdoes what you suggest, just taking the last path component.--barejust lets you hardcode the whole path asDESTINATION-- because we stick the digests in there, it doesn't really matter, but this would cause tag collisions normally.- and the default, which is similar to
--bare, but appends an md5 of the whole path to avoid collisions
The default is ugly, but it works :/
the command could throw a warning/error/rename a path in this case
This is fine, but what is a user to do if there's an error? We would need some kind of knob to turn, I think. They are not likely to be able to change the names of all their source repositories, as those are often outside of their control.
So I've got a handful of bad ideas to deal with this, but I'm not sure if any are palatable:
- Stick a
map[string]stringin a config file that statically mapssrc->dst - Have a hacky config language that is a little less verbose than a map e.g.:
source: "src.example.com/foo/bar/*"
destination: "dst.example.com/foo-bar/*"
- Have a way to provide a go template for renaming, possibly with some custom functions for common scenarios.
- Have a way to configure a binary we can shell out to for renaming, which would let you do stuff in bash or whatever language you want e.g.:
$ echo 'src.example.com/foo/bar:baz' | configured-rename-binary
'dst.example.com/foo-bar:baz
I feel like solution 3 and 4 are maybe overkill, but it's hard to find a method that works for everything.
Not sure if this is helpful or not :)