docker-image-resource
docker-image-resource copied to clipboard
Laundry list of smells from this resource
This resource hasn't aged well. We've long suspected a new approach may be necessary, potentially including a complete rewrite and/or breaking it up into separate resource types, each focused on a specific workflow.
I'm going to use this issue as a dumping ground of sorts for now, in hopes that it eventually leads to a more focused direction that can skewer a few of these piranhas at once.
- There are too many knobs, and way too many ways to
put
. - The testing situation is improving, but pretty bad.
- Important parts are written in
bash
and are subject to thedocker
CLI output format changing. Some things are already broken because of this. - There are too many ways to configure credentials.
- There's no caching by default, and it's unclear if everyone would want that.
cache: true
even pulls down the image willy-nilly, preventing Concourse caching of the image (ironically). (see https://github.com/concourse/docker-image-resource/issues/121, https://github.com/concourse/docker-image-resource/issues/148, https://github.com/concourse/docker-image-resource/pull/188)- Caching all the time is great and all until it's not. Lines like
apt-get update && apt-get -y install foo
will need the cache busted in order to get a newfoo
.
- Caching all the time is great and all until it's not. Lines like
- There isn't a great way to integrate
Dockerfile
s with Concourse-fetched dependencies while also having the image easy to build locally during development (you'd be missing all the dependencies). - There's no way to build without pushing to a registry. To be honest, as a resource that makes sense (there must be an external source of truth), but a reusable task would be great for this.
- It embeds a Docker daemon. Would it be easier to point at an external one? What impact might this have on caching? Would this be an acceptable assumption for most people? How would the resource be pointed at the daemon?
There may be value in adopting one of the alternatives that are emerging -- buildah, Img, kaniko etc. There is also a nascent "Container Builder Interface" (CBI) being spruiked which aims to unify these under a common API.
We can definitely seek advice and input from folks in the Google container tooling teams (tagging @mattmoor, @dlorenc and @ImJasonH), who are attacking this stuff from a lot of angles. @sclevine is also exploring this problem in the context of buildpacks, including looking closely at the various options I linked.
Edit: I should also tag @bparees from Red Hat. They've had to wrestle with Dockerfiles-vs-sanity for OpenShift a bit.
@jchesterpivotal Yeah, I think that's the future. I'm playing with kaniko
locally at the moment and it looks very promising for Concourse's use case. There are obviously limitations right now that would prevent us from switching to it (multi-stage Dockerfiles for example don't work atm), but I'm sure it'll get there.
So the goal with Kaniko would be to run in a regular unprivileged container and not require spinning up a daemon. The downside I suppose would be that there's no caching, which a lot of people seem to care about. There's a somewhat nebulous open issue for it: https://github.com/GoogleContainerTools/kaniko/issues/102
I think we'd at least need the ability to support the load_bases
param since that's a very Concoursey way of controlling the version of the base image it's built from. I don't see a way to do that at the moment.
In terms of load_bases
, could I interest you in image-rebase
?
Also, there's a lot to be said for breaking docker-image-resource
apart into a reusable task and a image-registry-resource.
First, we don't really have reusable tasks as a native concept yet, and it would be a useful one to ship. Second, they can use different bits of software for different tasks. A registry resource could use go-containerregistry
to handle the bit shipping.
Hello, great to see that this resource may get an big update/refactor
my two cents
Cache is important when there are complex Dockerfile
s and it results on big images.
I think external docker daemon is a good option if there is an way to start/stop it or somehow control it. (If i remember correctly, circleCI uses an external docker daemon)
Also with external docker, cache wound't just works if this daemon is long running?
Having different resources make sense in my view. So, build, run, push/pull could be different resources. So a CI would, build and pull any images needed, run (for tests and integration tests), and if stuff works, push the new version to registry.
We are start to using concourse for some pipelines so I am not that experienced with it, but our pipelines seems hacky regarding running tests and etc. We have some pretty big images, but the build process didn't went to concourse yet, but we will test it and see if cache works as expected for those. The small ones don't really need cache.
Thanks a lot for organizing this issue @vito. As @vito rightly points out, one of the big issues with this resource is the way that cache is abstracted. A standalone docker daemon has pretty good cache defaults and people are used to them. I think we should emulate "workstation docker" behavior as best as we can, whether from a central cache or per worker cache. Per worker cache seems like the most natural fit for concourse, and I think would still yield fairly good performance (e.g. frequently built images could likely be in cache).
I do not know if concourse supports my ideal cache for docker, or for any other immutable data (because the docs are sparse https://concourse-ci.org/caching-and-retention.html; https://concourse-ci.org/volume-internals.html ). But I would love to see persistent cache that is retained across builds and possibly pipelines for immutable data and only for immutable data. For immutable data such as docker layers or precisely versioned libraries, there are only benefits to caching data between pipelines from an end-user perspective. I think CI/CD is an area that warrants extra complexity in order to achieve higher performance because humans wait on CI/CD. Reducing feedback latency is a big win for development and operations velocity.
Here is my wishlist for this resource, but similar things can be said for any immutable resources:
-
Any layers downloaded from a
docker pull
should be automatically accessible and used by anydocker pull
done on the same worker - regardless if this was part of a cache warming step,docker pull
, ordocker build
. Definitely for builds in the same pipeline, ideally between pipelines. This alone will significantly simplify/supersede several cache options in this resource:cache
,cache_tag
,load
,load_base
, andload_bases
. -
Build steps should be automatically re-used. I think this is possible though I'm not completely sure.
-
The build pipeline should not need to know what docker images to cache as in
load_bases
orcache_from
, but it should optimally use cache anyway. Hard coding these images into the pipeline creates a tight coupling between your higher level pipeline and your lower level application build file (Dockerfile
); requiring coordination between the two. Ideally these images might only be referenced once in the app repo'sDockerfile
. You should not have to update your build pipeline when you update yourDockerfile
, unless there is a reason to not use the defaults. 1 and 2 above should make this step possible
To re-use build steps within a dedicated resource, such as in docker-imag-resource
's cache_from
: would it make sense to add a special tag to the build e.g. __resource-cache
, to later be consumed by a subsequent build e.g. --cache-from=this-image:__resource-cache
? The special tag could, by default, be created on all builds. That should yield fairly decent cachability for builds. We could similarly, by default, automatically add tags to intermediate stages in a multi-stage build as in this-image:__resource-cache-stage-2-4
(read: stage 2/4). Just spitballing here.
@chrishiestand Yeah, I agree with just about all of that. I'd be really interested in seeing a fork of this resource that is pointed at a centrally deployed Docker daemon, instead of spinning up its own. That would at least get you a lot of the desired caching behavior for free.
The downside of this approach is it suddenly adds a hard dependency on a Docker daemon, which isn't commonly deployed standalone, at least in my experience. A lot of people are going to want to forward their existing docker.sock
file, which won't currently be possible as resource containers are entirely isolated from the host (by design). That approach also won't be relevant at all to folks deploying without Docker.
So you could deploy one on its own and configure it to listen on TCP (maybe with TLS?), and then point your pipelines at it. You'd get much better caching behavior, at the expense of your builds now having to upload their context to a remote Docker daemon as part of the build. Probably a net benfit, but unfortunately it takes a bit more work to set up.
I think it's a worthwhile experiment, and shouldn't be too hard to implement as a fork of this resource. It should just be a matter of ripping out all the Docker daemon setup and the params related to it, and then targeting the configured remote daemon with the CLI.
In terms of alternative implementations, I would think about splitting it into two parts.
The first would be a image-build-task
, which under the hood could use Kaniko to avoid a dependency on a Docker daemon. The second would be a image-repository
resource which operates on images, not Dockerfiles. The latter could use go-containerregistry
to avoid another Docker dependency.
Having the split means that I can also come up with further permutations without needing a single resource to become increasingly busy:
- I can substitute an
image-build-task
for ajib-build-image-task
, ako-build-image-task
or even adocker-daemon-build-image-task
if I am prepared to isolate the daemon onto an isolated machine. One can also imagine build tasks based on Buildah, Orca, CBI, Img etc. - I can substitute
gcs-bucket-resource
forimage-registry
, because I know GCR simply acts as a clever facade over GCS. Or I can useartifactory-resource
. Or I can do something else entirely.
Finally the split opens up seams to allow the introduction of layer-by-layer logic. Right now images are treated as an all-or-nothing affair, but in the near future it will become more common for efficiency and security purposes to think about layers instead of images. Breaking apart the build task from the registry resource will help.
I'm making the assumption that redistributable tasks exist and are based on more than shared conventions. I think splitting up docker logic would be a decent way to explore what would be necessary for a higher-level Task concept.
Edit: I also just noticed I already said all of this previously, but more concisely.
@jchesterpivotal
Kaniko won't help with caching yet, but I still think it's worth trying, at least to remove the Docker-in-Docker aspect while preserving portability. For what it's worth, there has since been more discussion in https://github.com/GoogleContainerTools/kaniko/issues/102 which shows that their goals there are pretty well aligned with Concourse's (caching FROM
but not individual RUN
commands, for example, which makes a lot of sense for CI).
I'm making the assumption that redistributable tasks exist and are based on more than shared conventions. I think splitting up docker logic would be a decent way to explore what would be necessary for a higher-level Task concept.
Fully agreed. See recent discussion in https://github.com/concourse/docker-image-resource/pull/185 - there's currently way too much friction when it comes to redistributable tasks, which forces people to use resources for things they shouldn't be used for.
So I like the idea of splitting this up, because there's already clearly a need for build-without-push. (I'm also happy to see any proposal which cuts this resource up into smaller pieces, so that alone is great.)
@jchesterpivotal Here's a proof-of-concept using Kaniko to build, as a Concourse task:
---
platform: linux
image_resource:
type: docker-image
source: {repository: gcr.io/kaniko-project/executor}
inputs:
- name: source
outputs:
- name: image
run:
path: /kaniko/executor
args:
- --dockerfile=/tmp/build/e55deab7/source/Dockerfile
- "--destination=doesnt/matter:test"
- --context=/tmp/build/e55deab7/source
- --tarPath=/tmp/build/e55deab7/image/image.tar
Full usage/output here: https://gist.github.com/vito/ab90980adbeaf000739fff6a64e0407d
One gotcha is I had to use absolute paths for the various flags, because the first thing Kaniko does is cd /
. But that's a minor nit. It so happens that fly execute
d builds have a predictable build path, so I worked around it by hardcoding it. That's all I can really do since the image is build FROM scratch
and literally only contains the Kaniko executor. We'd have to find a better solution if we want to do this for real.
I also ran into this issue when trying to do build a multi-stage image: https://github.com/GoogleContainerTools/kaniko/issues/245
I've started to implement a new resource for fetching images from a registry: https://github.com/concourse/registry-image-resource
See the README for comparison/goals/etc.
So far so good. Testflight and TopGun are both passing with it as a drop-in replacement for docker-image
when used for task and resource type image fetching.
It's super nice this is being discussed. If it helps the workaround I have for now is:
- have a task to build and test docker images (docker build && inspec && docker save)
- use the current docker resource to push to a registry only (to have a way of handling images between jobs)
This is for our external images, the testing step is basically why I am using a task instead of the old docker image resource.
The main pain points for me are caching when it comes to batch building images, I have one folder for all images I use inside concourse and the whole folder is built on changes (internal stuff, image testing not such a priority). I just now started looking at improving this a bit and the caching comes up now :-)
So far I think all proposals here could improve stuff for me, I will try to use the new resource here to do some experiments, possibly even look at kaniko. I will continue looking at multi-arch builds so this might be a good chance for me to spend some more time on this. I hope this can be useful input in the design process here, altough admittedly I'm not sure that aarch64/amd64 dual builds are much of a priority for others.
I redid @vito's kaniko experiments, for me there's a few disadvantages:
- kaniko seems still a bit early (like you mentioned with the caching and multi stage builds). I agree that some of it will be possible to fix but I'm unsure about the caching. I'm not sure what's best I'm just saying in some ways it feels tempting to reuse the dockerd logic for this (i.e. still use dind and "somehow" get dockerd to handle the caching)...
- it's less multi-arch capable (currently it depends on an amd64 build of some credential helper for example), I mean it could be solved, just comparing with the existing dind solution
- the image I tried to build required a privileged container for kaniko anyway (something about permissions of the "chage" binary), that might be more of an upstream bug of RHEL, but there might be more like it (I admit I haven't investigated this properly)
In many ways I think that a resource for building based on dind and the go bindings could be a good compromise. I don't feel that depending on dind is a huge issue in itself when it comes to building.
The current resource is multi-arch capable by using the alpine repo/architecture ecosystem.
The caching in my mind is somewhat independent and it doesn't seem like there's easy solutions for it. I discovered that in my scenario the biggest issue is that I have a slow connection to dockerhub. I'll experiment a bit but maybe this can be better solved by using a local registry (to which there is a good connection) always and have that proxy dockerhub so the caching functionality is dealt with there instead (at least in my mind that keeps the architecture clean) ... in that case the split up in registry and building resources makes a lot of sense, we keep the components the way the were envisioned, we avoid implementing caching logic ourselves, we can get rid of the command line output parsing (if there still is any).
I'll try to take a look at @vito's registry resource and try to experiment with the registry proxying/mirroring (it turns out it's a thing: https://docs.docker.com/registry/recipes/mirror/)
Re: caching, https://github.com/GoogleContainerTools/kaniko/issues/300 is the current best idea for enabling fast incremental builds in kaniko.
To me, relying on dind means you simply cannot build on a Kubernetes cluster unless you trust every user on the cluster. By removing the need to trust each user, you can get closer to providing a build service on a single large multitenant cluster instead of having to isolate each user on their own cluster, which unnecessarily wastes resources.
/cc @priyawadhwa @dlorenc @mattmoor
@neumayer - what kaniko image did you use? Nothing we ship as officially supported from the kaniko repo should require the privileged flag.
@dlorenc I think you are right, I must have used some old image at some point in my experiments, I don't seem to be able to reproduce the privilege issue.
I whipped up a task that builds a Docker image using img
:
- https://github.com/concourse/builder
What's neat is caching was pretty easy to support via Concourse task caches, and they're easy to invalidate if needed with fly clear-task-cache
.
We're using this in our new pipeline for the unit-image
and dev-image
jobs:
- https://github.com/concourse/concourse/blob/4f0e1d8702db756c4a485b52e992573d4c933f18/ci/pipelines/concourse.yml#L84-L149
Note that this doesn't yet support running without privileged: true
.
@vito just FYI, kaniko currently supports multistage builds and opt-in remote layer caching for RUN commands (we're working on adding more optimizations to caching as well!).
@vito with concourse builder do we consider the laundry list is done? If not could we extract items not done to a new task list for better tracking? Thx!
I recently tried adopting these new container building tools into my pipelines. I settled on a pretty simple approach using Kaniko in a task. Very similar to @vito's proof-of-concept example above (but having the benefit of two more years of Kaniko development).
Notable:
- does not require
privileged: true
- ~supports caching~
- pairs well with registry-image-resource
- (optionally) creates deterministic containers by using Kaniko's --reproducible feature
- I did not run into the gotcha Vito mentioned about absolute paths
I feel like this is a meaningful improvement over current state docker-image-resource and over oci-build-task. Is there anything beyond sharing some pipeline recipes that would benefit the community? Perhaps replacing the oci-build-task recommendation in the registry-image-resource README?
@lnhrdt Awesome! Is there anywhere I can take a look and try it out? Cool that it works without privileged
. If it's a straight-up improvement, maybe I could swap out buildkit
for kaniko
in oci-build-task
. 🤔 It's not meant to be tied to Buildkit forever.
Hey @vito here's a self-contained pipeline that demonstrates the approach. Add your own registry creds and fly
away!
resources:
- name: test-image
type: registry-image
source:
repository: ((registry-host))/test
username: ((registry-username))
password: ((registry-password))
jobs:
- name: image
plan:
- task: create-image-source
config:
platform: linux
image_resource:
type: docker-image
source:
repository: alpine
outputs:
- name: image-source
run:
path: sh
args:
- -ecu
- |
cat > image-source/message.txt <<EOF
abc123
EOF
cat > image-source/Dockerfile <<EOF
FROM alpine
COPY message.txt /message.txt
EOF
- task: build-image
input_mapping:
context: image-source
config:
platform: linux
image_resource:
type: docker-image
source:
repository: gcr.io/kaniko-project/executor
inputs:
- name: context
outputs:
- name: image
run:
path: /kaniko/executor
args:
- --context=context
- --reproducible
- --tarPath=image/image.tar
- --no-push
- --destination=noop # https://git.io/JJDXu
- put: test-image
params:
image: image/image.tar
other thoughts
In my last comment I was mistaken about supporting caching. Kaniko has a caching mechanism but it works by caching in a repository. It's not compatible with the Concourse task caching paradigm. There are some recent issues in their repo that discuss adding support for "local caching" which would allow us to cache in a Concourse task (GoogleContainerTools/kaniko#1148, GoogleContainerTools/kaniko#923). I don't know how fast ideas move in the project but I'm keeping an eye on those.
I thought about swapping out buildkit for kaniko in oci-build-task too, but I ended up thinking that simply replacing it with a "task recipe" would be better because I didn't need to extend their image to add anything. Using Kaniko in a Concourse task was just a matter of mapping inputs to the right arguments. In the future this seems like a fantastic use case for concourse/rfcs#7 but since that's still in idea land, I thought a "task recipe" could help some folks out.
That --reproducible
feature in Kaniko is pretty fantastic in this Concourse use case. I can build images, push as resources, and only trigger subsequent jobs if the inputs to the image were actually new because the image digest is only dependent on the inputs, not the time it was built.
@lnhrdt Sweet, thanks for sharing the code and the context! I'll poke around.
It kind of sounds like Kaniko could provide a solution for vito/oci-build-task#1 which has been blocked on buildkit
upstream for a long time. At least if I'm interpreting their README correctly. (Though reading your linked issue, https://github.com/GoogleContainerTools/kaniko/issues/1148, it sounds like this might not work how I want it to.)
FYI: you may be interested in checking out the Prototypes RFC - it'll become the foundation for all shareable executable "things" in the future: resources, var sources, notifications, in addition to arbitrary actions with the run
step (i.e. concourse/rfcs#7). There are more details in the Re-inventing resource types blog post.
A hint for anyone who gets to this before I can: Kaniko expects full digests in the FROM
line for caching to work. Which obviously isn't how folks plan to use Concourse, since the get
is what determines the digest.
However, we could have a pattern like this:
FROM ${alpine_image:-alpine}
This will work locally (Dockerfiles respect the ${:-}
notation), and then we could have a Kaniko task wrapper that automatically places image caches in the proper directory and then passes the full image reference (alpine@<digest>
) as a build arg (alpine_image
in this case).
I like that this makes the cache usage more explicit, though it is a bit more verbose.
This is a train of thought. Notes mainly.
Based on @lnhrdt work, I've tried the caching mechanism. (Included the pipeline below)
The warmer will put images from an available container image registry.
This allows subsequent image building to use that cache
for the base image.
As demonstrated with Dockerfile.1
(below).
The issue with caching is that the built image first
is not cached for use with Dockerfile.2
.
It appears that the caching mechanism doesn't support files that have not been pushed to a registry.
The cache
input/output contains two files of an image:
-
sha256:some-sha
a tarball of the image pulled from the registry -
sha256:some-sha.json
a manifest describing the layers in the tarball
The sha256:45c6f8f1b2fe15adaa72305616d69a6cd641169bc8b16886756919e7c01fa48b.json
for ubuntu
looks like:
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 3352,
"digest": "sha256:56def654ec22f857f480cdcc640c474e2f84d4be2e549a9d16eaba3f397596e9"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 26701612,
"digest": "sha256:171857c49d0f5e2ebf623e6cb36a8bcad585ed0c2aa99c87a055df034c1e5848"
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 852,
"digest": "sha256:419640447d267f068d2f84a093cb13a56ce77e130877f5b8bdb4294f4a90a84f"
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 162,
"digest": "sha256:61e52f862619ab016d3bcfbd78e5c7aaaa1989b4c295e6dbcacddd2d7b93e1f5"
}
]
}
When generating the image first
, the tarball contains a manifest.json
, which maybe enough information to generate the above file.
For example,
[{"Config":"sha256:2bc60245855ee71a32c1377437833a38b9371db85a0c27c453ac0d23febbda57","RepoTags":["first"],"Layers":["be79478c86a9fce917b32c6d275e2086022d8951d7aa3d2070617c2b4d6644c7.tar.gz","69ab70f4eb4b0b13e03640757fb2ececa1d31d933d52d9651acc8679320bbf3f.tar.gz","3c829501b976555ea693ab2581721b1dc8e7bb62872732d2ccc0bfea99d17422.tar.gz","36a89f438795720740424e61aa398474b9c97b4db3806502d054e2830a9da37f.tar.gz","a7ed845d8b0dc083d5ba40f2e59900f6e48a76b63d0b9055efe5b981ae96766b.tar.gz"]}]
The pipeline.yml
:
jobs:
- name: create-container-images
plan:
- task: create-dockerfiles
config:
platform: linux
image_resource:
type: registry-image
source:
repository: ubuntu
outputs:
- name: context
run:
path: bash
args:
- -c
- |
set -eux
cat > context/Dockerfile.1 <<EOF
FROM ubuntu
RUN apt-get -y update
RUN apt-get -y install vim
EOF
cat > context/Dockerfile.2 <<EOF
FROM first
RUN apt-get -y install curl
EOF
- task: cache
config:
platform: linux
image_resource:
type: registry-image
source:
repository: gcr.io/kaniko-project/warmer
caches:
- path: cache
outputs:
- name: cache
run:
path: /kaniko/warmer
args:
- --cache-dir=cache
- --image=ubuntu
- task: build-packages
config:
platform: linux
image_resource:
type: registry-image
source:
repository: gcr.io/kaniko-project/executor
inputs:
- name: cache
- name: context
- name: images
optional: true
outputs:
- name: images
- name: cache
run:
path: /kaniko/executor
args:
- --context=context
- --reproducible
- --cache-dir=cache
- --no-push
- --destination=first
- --dockerfile=context/Dockerfile.1
- --tarPath=images/1.tar
- --image-name-with-digest-file=images/1.json
- task: build-binaries
config:
platform: linux
image_resource:
type: registry-image
source:
repository: gcr.io/kaniko-project/executor
inputs:
- name: cache
- name: context
- name: images
optional: true
outputs:
- name: images
- name: cache
run:
path: /kaniko/executor
args:
- --context=context
- --reproducible
- --cache-dir=cache
- --no-push
- --destination=second
- --dockerfile=context/Dockerfile.2
- --tarPath=images/2.tar
- --image-name-with-digest-file=images/2.json
ensure:
task: inspet
config:
platform: linux
image_resource:
type: registry-image
source:
repository: ubuntu
inputs:
- name: cache
- name: context
- name: images
run:
path: bash
args:
- -c
- |
set -eux
sleep 1000
As a follow-up to https://github.com/concourse/docker-image-resource/issues/190#issuecomment-678524466 - I was able to use the same build-arg-to-FROM
pattern in oci-build
task v0.7.0 to implement support for 'image args'.
With this feature implemented we're much closer to fully deprecating docker-image
resource. I think this can happen once the oci-build
task is converted to a prototype (concourse/rfcs#37), since the UX of re-usable tasks is still pretty rubbish. We're implementing the first use case for prototypes now (https://github.com/concourse/concourse/issues/5229) so it shouldn't be too hard to add a run
step soon!
@inhrdt, Thanks. I am able to build image using kaniko in concourse now.