buildkit
buildkit copied to clipboard
dockerfile: implement `ADD --checksum=<checksum> <http src> <dest>`
e.g.,
# syntax=docker/dockerfile-upstream:master
# ↑ Will be included in `syntax=docker/dockerfile:1.5`
FROM scratch
ADD --checksum=sha256:24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz /
Fix #975
Oh, this is great! Thanks for working on this! I was just discussing this feature with @crazy-max last week.
Following the related ticket, I was thinking about this, and I can see potential use-cases that would also want to use this for local files (so also for COPY?), thinking of cases like;
A local Dockerfile used to build from a build-context elsewhere; in that case the Dockerfile wants to check that the file in the "elsewhere" context has the expected checksum;
docker build -f ./Dockerfile /path/to/build-context/
docker build -f ./Dockerfile https://github.com/foo/bar.git
docker build -f- https://github.com/foo/bar.git <EOF
FROM foo
ADD --checksum=$CHECKSUM file.tar.gz .
EOF
Second thought (but perhaps feature creep / YAGNI); do we expect other verification methods to be added? If so, do we want this to be more generic (--verify)?
Thinking of;
FROM scratch
# verify checksum
ADD --verify=checksum=sha256:<checksum> some-file .
# verify GPG signed
ADD --verify=gpg=<GPG key> some-file .
potential use-cases that would also want to use this for local files (so also for COPY?)
Yes, but checksum verification is currently not implemented in the FileOp.
do we expect other verification methods to be added? If so, do we want this to be more generic (--verify)?
I think that will be in the source policy engine (OPA/Rego?) https://github.com/moby/buildkit/pull/2943#discussion_r964712652
Yes, but checksum verification is currently not implemented in the FileOp.
I'm not too familiar with all the internals, so my comment is purely "user perspective";
- Is it (technically) possible? (if not now, then in future?)
- What happens with the current PR if you try to use
--checksumfor a local file (withADD)?
- Is it (technically) possible? (if not now, then in future?)
Yes, but probably in a separate PR.
- What happens with the current PR if you try to use
--checksumfor a local file (withADD)?
Returns errors.New("checksum can't be specified for non-HTTP sources")
Yes, but probably in a separate PR. Returns
errors.New("checksum can't be specified for non-HTTP sources")
Thanks! Expanding in future is fine (as long as we indeed have a good error message before that).
I'll let others give their input as well on --checksum vs --verify 👍
This could be handy but I wonder if we could not put instead the chesksum as suffix after @ char like we do with the image digest:
FROM scratch
COPY --from=moby/buildkit:latest@sha256:67c9251f9f2e103e1ee489b6cead518b6d82607ef485d3f1505fc4095a55ebeb /usr/bin/buildkitd /usr/bin/
So it would not require an extra flag?:
# syntax=docker/dockerfile-upstream:master
# ↑ Will be included in `syntax=docker/dockerfile:1.5`
FROM scratch
ADD https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz@sha256:24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d /
Looking at the RFC
Reserved:
Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.
@ is a reserved char and should be encoded so it might be ok to use it as separator.
What about the cases like:
ADD https://..../releases/release-$TARGETOS-$TARGETARCH.tar.gz /
Is there some pattern, how checksums could be validated for this case?
https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz@sha256:24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d
This can be misinterpreted like
proto=https
username=mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz
host=sha256
port=24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d
What about the cases like:
ADD https://..../releases/release-$TARGETOS-$TARGETARCH.tar.gz /Is there some pattern, how checksums could be validated for this case?
This should work?
FROM scratch AS foo-linux-amd64
ADD --checksum=sha256:deadbeef https://..../releases/release-linux-amd64.tar.gz /
FROM scratch AS foo-freebsd-riscv64
ADD --checksum=sha256:cafebabe https://..../releases/release-freebsd-riscv64.tar.gz /
ARG TARGETOS
ARG TARGETARCH
FROM foo-$TARGETOS-$TARGETARCH AS foo
LGTM but maybe we should keep it in
labsfor a release in case there is a need for something like--verifyas well.
Do you think it should be --verify=checksum=XXX (https://github.com/moby/buildkit/pull/3093#issuecomment-1240337225) to be prepared for that, or "unlikely" that other options will be added?
LGTM but maybe we should keep it in
labsfor a release in case there is a need for something like--verifyas well.Do you think it should be
--verify=checksum=XXX(#3093 (comment)) to be prepared for that, or "unlikely" that other options will be added?
If we adopt this syntax, can we skip the labs?
If we adopt this syntax, can we skip the labs?
I would prefer not. I don't actually see how the --verify could be extended atm. For something like the GPG signature check, you would need to provide a trust chain with the full public key so I don't know how that flag could be used in its current form. If the key pulled from keyserver based on fingerprint then I guess the --verify syntax could work (or it could as well be --pgp then) but then we have a dependency on network access.
If we don't expect other options to be added, then --checksum SGTM
@AkihiroSuda any reason you put this one back in draft?
@AkihiroSuda any reason you put this one back in draft?
For moving this to the labs channel.
Moved to the labs. This should be ready to merge now.
@AkihiroSuda check the linter
Fixed
I tried to get this to work with sha512. It doesn't seem to. Any plans to enable the longer digest length?
I was wanting to rewrite this Dockerfile with this new syntax. I was able to with sha256 but not sha512.
https://github.com/dotnet/dotnet-docker/blob/c02e853354cba294b3ac934ef0c5b57a2109811e/src/runtime/8.0/bookworm-slim/amd64/Dockerfile
I tried to get this to work with sha512.
This isn't as easy as this validation isn't just checking the downloaded bytes but deeply integrated with the caching system.
Eg. if you have a build that does: ADD https://someurl foo.tar and save cache for that build we will save the digest of that artifact in the build cache so next time we just need to recompute the digest (there is etag caching as well) and can reuse the cache when it has not changed. But if next time your command is ADD --checksum=sha256:abc https://someurl foo.tar and the provided checksum matches the previous pull we don't need to check anything and can just match the cache directly. But if multiple algorithms are used that would mean that when we save cache for such operations we would need to compute the checksums with many algorithms that would be wasteful.
Got it. Thanks. Makes sense. I appreciate the extra insight.