buildkit
buildkit copied to clipboard
RFC: dockerfile: support `Dockerfile.sum` for pinning sources
EDIT: the current PR:
- https://github.com/moby/buildkit/pull/2943
Dockerfile.sum is an equivalent of go.sum but s/go/Dockerfile/ .
The content is a subset of BuildInfo:
{
"sources": [
{
"type": "docker-image",
"ref": "docker.io/library/alpine:latest",
"pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
},
{
"type": "http",
"ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
"pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
}
]
}
When Dockerfile.sum exists in the context, the Dockerfile builder does:
- Pinning the digest of
docker-imagesources (FROM ...) - Pinning the digest of
httpsources (ADD https://...) - Recording the consumed entries to the build info structure (
["containerimage.buildinfo"].consumedPin)
In the future, Dockerfile should also support ADD <gitref>. and pinning its commit hash.
- https://github.com/moby/buildkit/issues/2799
POC
https://github.com/AkihiroSuda/buildkit_poc/commits/pin-poc.20220411-0
:warning: The filename and the format of the "Dockerfile.sum" are subject to change.
$ cat Dockerfile
FROM alpine
ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md
$ cat Dockerfile.sum
{
"sources": [
{
"type": "docker-image",
"ref": "docker.io/library/alpine:latest",
"pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
},
{
"type": "http",
"ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
"pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
}
]
}
$ sudo buildctl build --frontend dockerfile.v0 --local dockerfile=. --local context=. --metadata-file metadata.json
[+] Building 3.0s (6/6) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 603B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 2.8s
=> [1/2] FROM docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454 0.0s
=> => resolve docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454 0.0s
=> https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md 0.0s
=> CACHED [2/2] ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md 0.0s
$ cat metadata.json
{
"containerimage.buildinfo": {
"frontend": "dockerfile.v0",
"sources": [
{
"type": "docker-image",
"ref": "docker.io/library/alpine:latest",
"pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
},
{
"type": "http",
"ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
"pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
}
],
"consumedPin": {
"digest": "sha256:42b78052859819b268e047da95512b20d2e64991d662e4af9f286d743f20b2d4",
"sources": [
{
"type": "docker-image",
"ref": "docker.io/library/alpine:latest",
"pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
},
{
"type": "http",
"ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
"pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
}
]
}
}
}
When a docker-image pin is wrong:
$sudo buildctl build --frontend dockerfile.v0 --local dockerfile=. --local context=. --metadata-file metadata.json
[+] Building 1.6s (3/3) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 603B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> ERROR [internal] load metadata for docker.io/library/alpine:latest 1.4s
------
> [internal] load metadata for docker.io/library/alpine:latest:
------
Dockerfile:1
--------------------
1 | >>> FROM alpine
2 | ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md
3 |
--------------------
error: failed to solve: alpine: docker.io/library/alpine:latest@sha256:fedbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454: not found
When an http pin is wrong:
$ sudo buildctl build --frontend dockerfile.v0 --local dockerfile=. --local context=. --metadata-file metadata.json
[+] Building 0.6s (5/6)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 603B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> [1/2] FROM docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454 0.0s
=> => resolve docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454 0.0s
=> ERROR https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md 0.3s
------
> https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md:
------
error: failed to solve: digest mismatch sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53: sha256:fe4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53
@tonistiigi @crazy-max @ktock PTAL 🙏
Looks like a nice way to provide/show the buildinfo feature to the end user, thanks @AkihiroSuda!
Using the ResolveImageConfig interface is clever so we have a common ground. I'm just afraid of the fallback to ImageMetaResolver. I think in case it falls back, it will have already do a metadata resolution in the frontend that could be useless (and could potentially increase rate-limit on Docker Hub).
Atm metadata resolution is not cached in the frontend (I think it uses the containerimage metaresolver which lacks of cache support). I recall about https://github.com/moby/buildkit/issues/706 and wonder if we should not look first at improving the cache experience on this side and maybe align it with your proposal.
I will think a bit more about your proposal but sgtm!
This looks awesome.
I'd just note that it more closely resembles a lock file rather than a go.sum file to me. They're very close so it's mostly just a nitpick. go.sum serves a few additional purposes beyond just pinning, related to the transparency log and version resolution.
Wonder if we could also think about a docker build(x) tidy cmd to add new requirements and fix current deps.
Some things to consider here:
BuildInfo is a result for a specific build, not a specific Dockerfile. A Dockerfile can have lots of targets depending on different images. And with the latest Buildx you can have a single build (one BuildInfo) from multiple Dockerfiles. If sum file is created manually, this isn't really a big issue as it only affects cases where different targets would use different pins. But I think the intent is to use sums from a previous build or generate this file.
BuildInfo in v0.10 also contains Attrs field that is opt-in for now, not only sources. We should think if there is a solution that isn't limited to the pinned sources but also allows to load the configuration from the previous build.
Is this meant to be a LLB level or frontend level(Dockerfile) concept? Given it is loaded from Dockerfile, I guess frontend, but the pinning concept could be generalized in LLB as well. We already added --build-context that includes lots of this functionality. You can already do a build that sends the pins for previous images, and they are used even if image tags have been updated. This only works for images atm though (and build stages), not git/http URLs etc. and assumes that client-side can already attach the pins to the build request. As an example, @crazy-max and I discussed possibility for --build-context <image-ref-with-embedded-buildinfo> and --build-context <previous-build-metadata-file-with-buildinfo>.json. This would load the pins (and config) from a previous build.
I didn't quite get the consumedPin thing. If I build with pins from a previous source, ideally, I would expect to get an identical image. So I would want the new BuildInfo to be the same as the previous one without any extra keys.
@crazy-max
Using the ResolveImageConfig interface is clever so we have a common ground. I'm just afraid of the fallback to ImageMetaResolver. I think in case it falls back, it will have already do a metadata resolution in the frontend that could be useless (and could potentially increase rate-limit on Docker Hub).
The current POC does not increase the numbers of HTTP request AFAICS, and I don't see increased possibility of hitting the rate limit.
Wonder if we could also think about a docker build(x) tidy cmd to add new requirements and fix current deps.
:+1:
@dlorenc
I'd just note that it more closely resembles a lock file rather than a go.sum file to me. They're very close so it's mostly just a nitpick. go.sum serves a few additional purposes beyond just pinning, related to the transparency log and version resolution.
Is it better to change Dockerfile.sum to something else like Dockerfile.pin or Dockerfile.lock ?
@tonistiigi
We should think if there is a solution that isn't limited to the pinned sources but also allows to load the configuration from the previous build.
Maybe we should have a version field or mediaType field for extensibility?
Is this meant to be a LLB level or frontend level(Dockerfile) concept? Given it is loaded from Dockerfile, I guess frontend, but the pinning concept could be generalized in LLB as well.
Frontend, because the image meta resolver has to be called before generating LLB. The pin format is designed to be portable to other frontends though.
I didn't quite get the consumedPin thing. If I build with pins from a previous source, ideally, I would expect to get an identical image. So I would want the new BuildInfo to be the same as the previous one without any extra keys.
Should this be moved to exporter metadata?
Sounds very cool :+1:
https://github.com/AkihiroSuda/buildkit_poc/commit/99071c352a6154262f73b453e547a0b0adac4839#diff-50c7e2f4bb8c30ace7a5a6d2858ade67023db56d633f887b26768afae38ece7dR76
// HTTPChecksum returns the checksum if url is present in a.Pin. // Otherwise returns an empty string without an error.
Shouldn't we produce an error if there is a missing entry in Dockerfile.sum? (go.mod seems to do so)
And +1 to @crazy-max 's docker build(x) tidy
Maybe, Dockerfile or something can provide the user a way to declare how to acquire a checksum value of a dependency (e.g. downloading SHA256SUMS file from github release page).
Would something like this help? https://twitter.com/_ctlfsh/status/1250596462502703104
Cosign has also explored adding a CLI to resolve image references in a Dockerfile so they're more reproducible/verifiable: see https://github.com/sigstore/cosign/issues/707 and PR https://github.com/sigstore/cosign/pull/1120
If there was one reliable, canonical way to do this, that would be a huge improvement.
Also considering something like a Dockerfile.mod where such locks could be added (as they are in go.mod), however a separate .lock file may be good here just because many times you don't want to actually pin resources (e.g. images like ubuntu:18.04 you probably want the most up to date version).
Just a thought: how much do you want to pin? Pinning the FROM image should be straightforward, but pinning the state of software at the time of build is much harder. In order to do tern lock Dockerfile, we had to pull out the created_by commands from the Dockerfile, pull out the environment variables from the config, do a variable expansion in the created_by string, and write it back. For individual components, we had to first inventory the image and then replace all instances of the package in the Dockerfile with a pinned dependency. This doesn't work for ADD/COPY statements after build, especially if you are copying specific files with no context.
@ktock
Shouldn't we produce an error if there is a missing entry in Dockerfile.sum? (go.mod seems to do so)
No, not all entries have to be / can be pinned.
Especially http source entries.
@nishakm @imjasonh The goal of this PR is to cover http sources (ADD https://...) and git sources (ADD git://...) as well as docker-image (FROM ...) sources. Tern and Cosign do not seem capable to support non-image sources.
Opened PR:
- ~https://github.com/moby/buildkit/pull/2816~ https://github.com/moby/buildkit/pull/2943
I don't understand the purpose of this proposal. If you really want to pin the base image, you can already do this by referencing the image digest instead of the tag. Also, the image tag could come from a build arg, so I don't see how you could have one authoritative pin/lock/sum file in that kind of circumstance.
I think instead of this, there should just be a way to specify the digest as part of the ADD instruction itself, just like you can already do for FROM.
@rittneje
There's a couple of reasons:
- Not having to change the source definition to pin the resources
- Not having to change the same definition in multiple places in the same source
- Applying the same pins across multiple builds.
@cpuguy83 I'm assuming by "source" here you mean the Dockerfile.
Not having to change the source definition to pin the resources
If you use build args for the pins then you don't need to change the Dockerfile per se, just whatever is supplying the args. Also, if you are downloading files within RUN via curl or the like, then already you should be doing something similar for sha256sum. So actually, it is preferable to have it all in the Dockerfile, rather than half in the Dockerfile and half in some new lock/pin/sum file.
Not having to change the same definition in multiple places in the same source
Build args already handle this too. For example:
ARG IMAGE_DIGEST=sha256:...
FROM ubuntu@${IMAGE_DIGEST} AS install
...
FROM ubuntu@${IMAGE_DIGEST} AS final
...
Applying the same pins across multiple builds.
I don't know what this means.
Build args do not handle this. Build args require the author of the Dockerfile to add those args explicitly, and everywhere. It also requires the person performing the build to know what the args are. It also assumes the person building is calling "docker build" directly as opposed to something like "make build".
I don't know what this means.
Multiple dockerfiles or build targets in one repo.
@cpuguy83 One thing that is unclear to me from this proposal is what specifically happens when a tag gets changed. For example, suppose I do FROM python:3.8. At the time that resolved to sha256:abc123, so that gets recorded in the lock/sum/pin file. But now that tag is pointing to sha256:def456 in DockerHub. Does buildkit fail the build (because it pulls python:3.8 and now the digests don't match), or does it essentially rewrite FROM python:3.8 into FROM python@sha256:abc123? A similar situation unfolds for ADD git.
If you have it pinned, it doesn't matter what digest the tag points to in DockerHub, you've pinned it to what you want it to be.
Does buildkit fail the build (because it pulls
python:3.8and now the digests don't match), or does it essentially rewriteFROM python:3.8intoFROM python@sha256:abc123?
Latter for regular tags, former for immutable tags
- https://github.com/opencontainers/distribution-spec/pull/320
Slightly related;
- https://github.com/moby/buildkit/issues/975
- Feature request: Abbility to verify checksum with the ADD command #975
Yes, this proposal covers #975, but provides a generic way for multiple LLB source ops
Yes, to some extend this can cover that feature request (but likely would require one to manually create the Dockerfile.sum (which also feels a bit awkward).
I haven't looked in-depth, but wondering how this proposal works with build-args (which can influence the build-steps, and thus what "checksum" to look for?), but I guess in that case it's an exercise to the user to create entries for every variant 🤔
How would this work when working with a multi-arch base image? I haven't really worked with them, but my understanding is you would have multiple images with the same tag but different digests (one per os/arch).
For multi arch it can refer to the manifest index (which is the "wrapper" tying together the image variants)
We have source-policy-file now: https://github.com/moby/buildkit/blob/v0.12.2/docs/build-repro.md#reproducing-the-pinned-dependencies
{
"rules": [
{
"action": "CONVERT",
"selector": {
"identifier": "docker-image://docker.io/library/alpine:latest"
},
"updates": {
"identifier": "docker-image://docker.io/library/alpine:latest@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
}
},
{
"action": "CONVERT",
"selector": {
"identifier": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md"
},
"updates": {
"attrs": {"http.checksum": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"}
}
}
]
}