buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

dockerfile: implement `ADD --checksum=<checksum> <http src> <dest>`

Open AkihiroSuda opened this issue 3 years ago • 9 comments

e.g.,

# syntax=docker/dockerfile-upstream:master
# ↑ Will be included in `syntax=docker/dockerfile:1.5`

FROM scratch
ADD --checksum=sha256:24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz /

Fix #975

AkihiroSuda avatar Sep 08 '22 07:09 AkihiroSuda

Oh, this is great! Thanks for working on this! I was just discussing this feature with @crazy-max last week.

Following the related ticket, I was thinking about this, and I can see potential use-cases that would also want to use this for local files (so also for COPY?), thinking of cases like;

A local Dockerfile used to build from a build-context elsewhere; in that case the Dockerfile wants to check that the file in the "elsewhere" context has the expected checksum;

docker build -f ./Dockerfile /path/to/build-context/

docker build -f ./Dockerfile https://github.com/foo/bar.git


docker build -f- https://github.com/foo/bar.git <EOF
FROM foo
ADD --checksum=$CHECKSUM file.tar.gz .
EOF

Second thought (but perhaps feature creep / YAGNI); do we expect other verification methods to be added? If so, do we want this to be more generic (--verify)?

Thinking of;

FROM scratch

# verify checksum
ADD --verify=checksum=sha256:<checksum> some-file .

# verify GPG signed
ADD --verify=gpg=<GPG key> some-file .

thaJeztah avatar Sep 08 '22 07:09 thaJeztah

potential use-cases that would also want to use this for local files (so also for COPY?)

Yes, but checksum verification is currently not implemented in the FileOp.

do we expect other verification methods to be added? If so, do we want this to be more generic (--verify)?

I think that will be in the source policy engine (OPA/Rego?) https://github.com/moby/buildkit/pull/2943#discussion_r964712652

AkihiroSuda avatar Sep 08 '22 07:09 AkihiroSuda

Yes, but checksum verification is currently not implemented in the FileOp.

I'm not too familiar with all the internals, so my comment is purely "user perspective";

  • Is it (technically) possible? (if not now, then in future?)
  • What happens with the current PR if you try to use --checksum for a local file (with ADD)?

thaJeztah avatar Sep 08 '22 07:09 thaJeztah

  • Is it (technically) possible? (if not now, then in future?)

Yes, but probably in a separate PR.

  • What happens with the current PR if you try to use --checksum for a local file (with ADD)?

Returns errors.New("checksum can't be specified for non-HTTP sources")

AkihiroSuda avatar Sep 08 '22 07:09 AkihiroSuda

Yes, but probably in a separate PR. Returns errors.New("checksum can't be specified for non-HTTP sources")

Thanks! Expanding in future is fine (as long as we indeed have a good error message before that).

I'll let others give their input as well on --checksum vs --verify 👍

thaJeztah avatar Sep 08 '22 07:09 thaJeztah

This could be handy but I wonder if we could not put instead the chesksum as suffix after @ char like we do with the image digest:

FROM scratch
COPY --from=moby/buildkit:latest@sha256:67c9251f9f2e103e1ee489b6cead518b6d82607ef485d3f1505fc4095a55ebeb /usr/bin/buildkitd /usr/bin/

So it would not require an extra flag?:

# syntax=docker/dockerfile-upstream:master
# ↑ Will be included in `syntax=docker/dockerfile:1.5`

FROM scratch
ADD https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz@sha256:24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d /

Looking at the RFC

Reserved:

Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.

@ is a reserved char and should be encoded so it might be ok to use it as separator.

crazy-max avatar Sep 20 '22 13:09 crazy-max

What about the cases like:

ADD https://..../releases/release-$TARGETOS-$TARGETARCH.tar.gz /

Is there some pattern, how checksums could be validated for this case?

tonistiigi avatar Sep 20 '22 18:09 tonistiigi

https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz@sha256:24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d

This can be misinterpreted like

proto=https
username=mirrors.edge.kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz
host=sha256
port=24454f830cdb571e2c4ad15481119c43b3cafd48dd869a9b2945d1036d1dc68d

AkihiroSuda avatar Sep 22 '22 22:09 AkihiroSuda

What about the cases like:

ADD https://..../releases/release-$TARGETOS-$TARGETARCH.tar.gz /

Is there some pattern, how checksums could be validated for this case?

This should work?

FROM scratch AS foo-linux-amd64
ADD --checksum=sha256:deadbeef https://..../releases/release-linux-amd64.tar.gz /

FROM scratch AS foo-freebsd-riscv64
ADD --checksum=sha256:cafebabe https://..../releases/release-freebsd-riscv64.tar.gz /

ARG TARGETOS
ARG TARGETARCH
FROM foo-$TARGETOS-$TARGETARCH AS foo

AkihiroSuda avatar Sep 22 '22 22:09 AkihiroSuda

LGTM but maybe we should keep it in labs for a release in case there is a need for something like --verify as well.

Do you think it should be --verify=checksum=XXX (https://github.com/moby/buildkit/pull/3093#issuecomment-1240337225) to be prepared for that, or "unlikely" that other options will be added?

thaJeztah avatar Oct 04 '22 13:10 thaJeztah

LGTM but maybe we should keep it in labs for a release in case there is a need for something like --verify as well.

Do you think it should be --verify=checksum=XXX (#3093 (comment)) to be prepared for that, or "unlikely" that other options will be added?

If we adopt this syntax, can we skip the labs?

AkihiroSuda avatar Oct 08 '22 01:10 AkihiroSuda

If we adopt this syntax, can we skip the labs?

I would prefer not. I don't actually see how the --verify could be extended atm. For something like the GPG signature check, you would need to provide a trust chain with the full public key so I don't know how that flag could be used in its current form. If the key pulled from keyserver based on fingerprint then I guess the --verify syntax could work (or it could as well be --pgp then) but then we have a dependency on network access.

tonistiigi avatar Oct 12 '22 04:10 tonistiigi

If we don't expect other options to be added, then --checksum SGTM

@AkihiroSuda any reason you put this one back in draft?

thaJeztah avatar Oct 12 '22 16:10 thaJeztah

@AkihiroSuda any reason you put this one back in draft?

For moving this to the labs channel.

AkihiroSuda avatar Oct 12 '22 17:10 AkihiroSuda

Moved to the labs. This should be ready to merge now.

AkihiroSuda avatar Oct 13 '22 01:10 AkihiroSuda

@AkihiroSuda check the linter

tonistiigi avatar Oct 13 '22 02:10 tonistiigi

Fixed

AkihiroSuda avatar Oct 13 '22 02:10 AkihiroSuda

I tried to get this to work with sha512. It doesn't seem to. Any plans to enable the longer digest length?

I was wanting to rewrite this Dockerfile with this new syntax. I was able to with sha256 but not sha512.

https://github.com/dotnet/dotnet-docker/blob/c02e853354cba294b3ac934ef0c5b57a2109811e/src/runtime/8.0/bookworm-slim/amd64/Dockerfile

richlander avatar Aug 07 '23 22:08 richlander

I tried to get this to work with sha512.

This isn't as easy as this validation isn't just checking the downloaded bytes but deeply integrated with the caching system.

Eg. if you have a build that does: ADD https://someurl foo.tar and save cache for that build we will save the digest of that artifact in the build cache so next time we just need to recompute the digest (there is etag caching as well) and can reuse the cache when it has not changed. But if next time your command is ADD --checksum=sha256:abc https://someurl foo.tar and the provided checksum matches the previous pull we don't need to check anything and can just match the cache directly. But if multiple algorithms are used that would mean that when we save cache for such operations we would need to compute the checksums with many algorithms that would be wasteful.

tonistiigi avatar Aug 09 '23 07:08 tonistiigi

Got it. Thanks. Makes sense. I appreciate the extra insight.

richlander avatar Aug 09 '23 18:08 richlander