buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

RFC: dockerfile: support `Dockerfile.sum` for pinning sources

Open AkihiroSuda opened this issue 3 years ago • 26 comments
trafficstars

EDIT: the current PR:

  • https://github.com/moby/buildkit/pull/2943

Dockerfile.sum is an equivalent of go.sum but s/go/Dockerfile/ . The content is a subset of BuildInfo:

{
    "sources": [
      {
        "type": "docker-image",
        "ref": "docker.io/library/alpine:latest",
        "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      },
      {
        "type": "http",
        "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
        "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
      }
    ]
}

When Dockerfile.sum exists in the context, the Dockerfile builder does:

  • Pinning the digest of docker-image sources (FROM ...)
  • Pinning the digest of http sources (ADD https://...)
  • Recording the consumed entries to the build info structure (["containerimage.buildinfo"].consumedPin)

In the future, Dockerfile should also support ADD <gitref>. and pinning its commit hash.

  • https://github.com/moby/buildkit/issues/2799

POC

https://github.com/AkihiroSuda/buildkit_poc/commits/pin-poc.20220411-0

:warning: The filename and the format of the "Dockerfile.sum" are subject to change.

$ cat Dockerfile
FROM alpine
ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md

$ cat Dockerfile.sum 
{
    "sources": [
      {
        "type": "docker-image",
        "ref": "docker.io/library/alpine:latest",
        "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      },
      {
        "type": "http",
        "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
        "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
      }
    ]
}

$ sudo buildctl build --frontend dockerfile.v0  --local dockerfile=. --local context=. --metadata-file metadata.json 
[+] Building 3.0s (6/6) FINISHED                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                  0.0s
 => => transferring dockerfile: 603B                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                     0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                      2.8s
 => [1/2] FROM docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => => resolve docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md                                                                 0.0s
 => CACHED [2/2] ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md                                     0.0s

$ cat metadata.json 
{
  "containerimage.buildinfo": {
    "frontend": "dockerfile.v0",
    "sources": [
      {
        "type": "docker-image",
        "ref": "docker.io/library/alpine:latest",
        "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      },
      {
        "type": "http",
        "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
        "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
      }
    ],
    "consumedPin": {
      "digest": "sha256:42b78052859819b268e047da95512b20d2e64991d662e4af9f286d743f20b2d4",
      "sources": [
        {
          "type": "docker-image",
          "ref": "docker.io/library/alpine:latest",
          "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
        },
        {
          "type": "http",
          "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
          "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
        }
      ]
    }
  }
}

When a docker-image pin is wrong:

$sudo buildctl build --frontend dockerfile.v0  --local dockerfile=. --local context=. --metadata-file metadata.json 
[+] Building 1.6s (3/3) FINISHED                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                  0.0s
 => => transferring dockerfile: 603B                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                     0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => ERROR [internal] load metadata for docker.io/library/alpine:latest                                                                1.4s
------
 > [internal] load metadata for docker.io/library/alpine:latest:
------
Dockerfile:1
--------------------
   1 | >>> FROM alpine
   2 |     ADD https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md /README.md
   3 |     
--------------------
error: failed to solve: alpine: docker.io/library/alpine:latest@sha256:fedbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454: not found

When an http pin is wrong:

$ sudo buildctl build --frontend dockerfile.v0  --local dockerfile=. --local context=. --metadata-file metadata.json 
[+] Building 0.6s (5/6)                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                  0.0s
 => => transferring dockerfile: 603B                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                     0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                      0.0s
 => [1/2] FROM docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => => resolve docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454                       0.0s
 => ERROR https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md                                                           0.3s
------
 > https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md:
------
error: failed to solve: digest mismatch sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53: sha256:fe4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53

AkihiroSuda avatar Apr 10 '22 15:04 AkihiroSuda

@tonistiigi @crazy-max @ktock PTAL 🙏

AkihiroSuda avatar Apr 10 '22 15:04 AkihiroSuda

Looks like a nice way to provide/show the buildinfo feature to the end user, thanks @AkihiroSuda!

Using the ResolveImageConfig interface is clever so we have a common ground. I'm just afraid of the fallback to ImageMetaResolver. I think in case it falls back, it will have already do a metadata resolution in the frontend that could be useless (and could potentially increase rate-limit on Docker Hub).

Atm metadata resolution is not cached in the frontend (I think it uses the containerimage metaresolver which lacks of cache support). I recall about https://github.com/moby/buildkit/issues/706 and wonder if we should not look first at improving the cache experience on this side and maybe align it with your proposal.

I will think a bit more about your proposal but sgtm!

crazy-max avatar Apr 10 '22 16:04 crazy-max

This looks awesome.

I'd just note that it more closely resembles a lock file rather than a go.sum file to me. They're very close so it's mostly just a nitpick. go.sum serves a few additional purposes beyond just pinning, related to the transparency log and version resolution.

dlorenc avatar Apr 10 '22 16:04 dlorenc

Wonder if we could also think about a docker build(x) tidy cmd to add new requirements and fix current deps.

crazy-max avatar Apr 10 '22 16:04 crazy-max

Some things to consider here:

BuildInfo is a result for a specific build, not a specific Dockerfile. A Dockerfile can have lots of targets depending on different images. And with the latest Buildx you can have a single build (one BuildInfo) from multiple Dockerfiles. If sum file is created manually, this isn't really a big issue as it only affects cases where different targets would use different pins. But I think the intent is to use sums from a previous build or generate this file.

BuildInfo in v0.10 also contains Attrs field that is opt-in for now, not only sources. We should think if there is a solution that isn't limited to the pinned sources but also allows to load the configuration from the previous build.

Is this meant to be a LLB level or frontend level(Dockerfile) concept? Given it is loaded from Dockerfile, I guess frontend, but the pinning concept could be generalized in LLB as well. We already added --build-context that includes lots of this functionality. You can already do a build that sends the pins for previous images, and they are used even if image tags have been updated. This only works for images atm though (and build stages), not git/http URLs etc. and assumes that client-side can already attach the pins to the build request. As an example, @crazy-max and I discussed possibility for --build-context <image-ref-with-embedded-buildinfo> and --build-context <previous-build-metadata-file-with-buildinfo>.json. This would load the pins (and config) from a previous build.

I didn't quite get the consumedPin thing. If I build with pins from a previous source, ideally, I would expect to get an identical image. So I would want the new BuildInfo to be the same as the previous one without any extra keys.

tonistiigi avatar Apr 10 '22 21:04 tonistiigi

@crazy-max

Using the ResolveImageConfig interface is clever so we have a common ground. I'm just afraid of the fallback to ImageMetaResolver. I think in case it falls back, it will have already do a metadata resolution in the frontend that could be useless (and could potentially increase rate-limit on Docker Hub).

The current POC does not increase the numbers of HTTP request AFAICS, and I don't see increased possibility of hitting the rate limit.

Wonder if we could also think about a docker build(x) tidy cmd to add new requirements and fix current deps.

:+1:


@dlorenc

I'd just note that it more closely resembles a lock file rather than a go.sum file to me. They're very close so it's mostly just a nitpick. go.sum serves a few additional purposes beyond just pinning, related to the transparency log and version resolution.

Is it better to change Dockerfile.sum to something else like Dockerfile.pin or Dockerfile.lock ?


@tonistiigi

We should think if there is a solution that isn't limited to the pinned sources but also allows to load the configuration from the previous build.

Maybe we should have a version field or mediaType field for extensibility?

Is this meant to be a LLB level or frontend level(Dockerfile) concept? Given it is loaded from Dockerfile, I guess frontend, but the pinning concept could be generalized in LLB as well.

Frontend, because the image meta resolver has to be called before generating LLB. The pin format is designed to be portable to other frontends though.

I didn't quite get the consumedPin thing. If I build with pins from a previous source, ideally, I would expect to get an identical image. So I would want the new BuildInfo to be the same as the previous one without any extra keys.

Should this be moved to exporter metadata?

AkihiroSuda avatar Apr 11 '22 10:04 AkihiroSuda

Sounds very cool :+1:

https://github.com/AkihiroSuda/buildkit_poc/commit/99071c352a6154262f73b453e547a0b0adac4839#diff-50c7e2f4bb8c30ace7a5a6d2858ade67023db56d633f887b26768afae38ece7dR76

// HTTPChecksum returns the checksum if url is present in a.Pin. // Otherwise returns an empty string without an error.

Shouldn't we produce an error if there is a missing entry in Dockerfile.sum? (go.mod seems to do so)

And +1 to @crazy-max 's docker build(x) tidy Maybe, Dockerfile or something can provide the user a way to declare how to acquire a checksum value of a dependency (e.g. downloading SHA256SUMS file from github release page).

ktock avatar Apr 11 '22 13:04 ktock

Would something like this help? https://twitter.com/_ctlfsh/status/1250596462502703104

nishakm avatar Apr 11 '22 15:04 nishakm

Cosign has also explored adding a CLI to resolve image references in a Dockerfile so they're more reproducible/verifiable: see https://github.com/sigstore/cosign/issues/707 and PR https://github.com/sigstore/cosign/pull/1120

If there was one reliable, canonical way to do this, that would be a huge improvement.

imjasonh avatar Apr 11 '22 17:04 imjasonh

Also considering something like a Dockerfile.mod where such locks could be added (as they are in go.mod), however a separate .lock file may be good here just because many times you don't want to actually pin resources (e.g. images like ubuntu:18.04 you probably want the most up to date version).

cpuguy83 avatar Apr 11 '22 17:04 cpuguy83

Just a thought: how much do you want to pin? Pinning the FROM image should be straightforward, but pinning the state of software at the time of build is much harder. In order to do tern lock Dockerfile, we had to pull out the created_by commands from the Dockerfile, pull out the environment variables from the config, do a variable expansion in the created_by string, and write it back. For individual components, we had to first inventory the image and then replace all instances of the package in the Dockerfile with a pinned dependency. This doesn't work for ADD/COPY statements after build, especially if you are copying specific files with no context.

nishakm avatar Apr 11 '22 17:04 nishakm

@ktock

Shouldn't we produce an error if there is a missing entry in Dockerfile.sum? (go.mod seems to do so)

No, not all entries have to be / can be pinned. Especially http source entries.


@nishakm @imjasonh The goal of this PR is to cover http sources (ADD https://...) and git sources (ADD git://...) as well as docker-image (FROM ...) sources. Tern and Cosign do not seem capable to support non-image sources.

AkihiroSuda avatar Apr 12 '22 04:04 AkihiroSuda

Opened PR:

  • ~https://github.com/moby/buildkit/pull/2816~ https://github.com/moby/buildkit/pull/2943

AkihiroSuda avatar Apr 20 '22 09:04 AkihiroSuda

I don't understand the purpose of this proposal. If you really want to pin the base image, you can already do this by referencing the image digest instead of the tag. Also, the image tag could come from a build arg, so I don't see how you could have one authoritative pin/lock/sum file in that kind of circumstance.

I think instead of this, there should just be a way to specify the digest as part of the ADD instruction itself, just like you can already do for FROM.

rittneje avatar Apr 23 '22 17:04 rittneje

@rittneje

There's a couple of reasons:

  1. Not having to change the source definition to pin the resources
  2. Not having to change the same definition in multiple places in the same source
  3. Applying the same pins across multiple builds.

cpuguy83 avatar Apr 23 '22 17:04 cpuguy83

@cpuguy83 I'm assuming by "source" here you mean the Dockerfile.

Not having to change the source definition to pin the resources

If you use build args for the pins then you don't need to change the Dockerfile per se, just whatever is supplying the args. Also, if you are downloading files within RUN via curl or the like, then already you should be doing something similar for sha256sum. So actually, it is preferable to have it all in the Dockerfile, rather than half in the Dockerfile and half in some new lock/pin/sum file.

Not having to change the same definition in multiple places in the same source

Build args already handle this too. For example:

ARG IMAGE_DIGEST=sha256:...

FROM ubuntu@${IMAGE_DIGEST} AS install
...

FROM ubuntu@${IMAGE_DIGEST} AS final
...

Applying the same pins across multiple builds.

I don't know what this means.

rittneje avatar Apr 23 '22 18:04 rittneje

Build args do not handle this. Build args require the author of the Dockerfile to add those args explicitly, and everywhere. It also requires the person performing the build to know what the args are. It also assumes the person building is calling "docker build" directly as opposed to something like "make build".

cpuguy83 avatar Apr 23 '22 19:04 cpuguy83

I don't know what this means.

Multiple dockerfiles or build targets in one repo.

cpuguy83 avatar Apr 23 '22 19:04 cpuguy83

@cpuguy83 One thing that is unclear to me from this proposal is what specifically happens when a tag gets changed. For example, suppose I do FROM python:3.8. At the time that resolved to sha256:abc123, so that gets recorded in the lock/sum/pin file. But now that tag is pointing to sha256:def456 in DockerHub. Does buildkit fail the build (because it pulls python:3.8 and now the digests don't match), or does it essentially rewrite FROM python:3.8 into FROM python@sha256:abc123? A similar situation unfolds for ADD git.

rittneje avatar Apr 23 '22 21:04 rittneje

If you have it pinned, it doesn't matter what digest the tag points to in DockerHub, you've pinned it to what you want it to be.

cpuguy83 avatar Apr 23 '22 22:04 cpuguy83

Does buildkit fail the build (because it pulls python:3.8 and now the digests don't match), or does it essentially rewrite FROM python:3.8 into FROM python@sha256:abc123?

Latter for regular tags, former for immutable tags

  • https://github.com/opencontainers/distribution-spec/pull/320

AkihiroSuda avatar Apr 25 '22 04:04 AkihiroSuda

Slightly related;

  • https://github.com/moby/buildkit/issues/975

thaJeztah avatar Apr 27 '22 07:04 thaJeztah

  • Feature request: Abbility to verify checksum with the ADD command #975

Yes, this proposal covers #975, but provides a generic way for multiple LLB source ops

AkihiroSuda avatar Apr 27 '22 07:04 AkihiroSuda

Yes, to some extend this can cover that feature request (but likely would require one to manually create the Dockerfile.sum (which also feels a bit awkward).

I haven't looked in-depth, but wondering how this proposal works with build-args (which can influence the build-steps, and thus what "checksum" to look for?), but I guess in that case it's an exercise to the user to create entries for every variant 🤔

thaJeztah avatar Apr 27 '22 08:04 thaJeztah

How would this work when working with a multi-arch base image? I haven't really worked with them, but my understanding is you would have multiple images with the same tag but different digests (one per os/arch).

rittneje avatar Apr 27 '22 12:04 rittneje

For multi arch it can refer to the manifest index (which is the "wrapper" tying together the image variants)

thaJeztah avatar Apr 27 '22 12:04 thaJeztah

We have source-policy-file now: https://github.com/moby/buildkit/blob/v0.12.2/docs/build-repro.md#reproducing-the-pinned-dependencies

{
  "rules": [
    {
      "action": "CONVERT",
      "selector": {
        "identifier": "docker-image://docker.io/library/alpine:latest"
      },
      "updates": {
        "identifier": "docker-image://docker.io/library/alpine:latest@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      }
    },
    {
      "action": "CONVERT",
      "selector": {
        "identifier": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md"
      },
      "updates": {
        "attrs": {"http.checksum": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"}
      }
    }
  ]
}

AkihiroSuda avatar Sep 14 '23 14:09 AkihiroSuda