buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Reproducibility is broken when re-building the exact same image multiple times because sometimes the `moby.buildkit.cache.v0` entry changes

Open skirsten opened this issue 3 years ago • 2 comments

The problem is that after a few re-builds the inline cache value (base64) changes. Nothing else changes except that the chains object is removed.

I don't know how to explain that but its super easy to reproduce (on latest and master buildkit):

test/Dockerfile that just copies something

FROM busybox@sha256:98de1ad411c6d08e50f26f392f3bc6cd65f686469b7c22a85c7b5fb1b820c154

# do some stupid copy
COPY --link --from=alpine@sha256:9b2a28eb47540823042a2ba401386845089bb7b62a9637d55816132c4c3c36eb /bin/ls /bin/ls

reproduce.sh

img="ghcr.io/skirsten/tmp:test-loop-4" # Change or remove this image between tests

for j in {1..10}; do
  buildctl prune >/dev/null

  echo "building..."
  buildctl build --frontend dockerfile.v0 --local context=test --local dockerfile=test \
    --import-cache type=registry,ref=$img \
    --export-cache type=inline \
    --output type=image,name=$img,push=true \
    --metadata-file metadata.json 2>/dev/null

  digest=$(jq -r '."containerimage.digest"' metadata.json)
  config_digest=$(jq -r '."containerimage.config.digest"' metadata.json)

  echo "digest: $digest"
  echo "config_digest: $config_digest"

  crane config "$img@$digest" | jq -r '."moby.buildkit.cache.v0"' | base64 -d | jq . >new.json

  diff old.json new.json || true
  mv new.json old.json

  echo

  sleep 1
done

Output:

building...
digest: sha256:630b2de48a2e51079cec38c002f0c8bc3820b859557963c9fbc79e0d4697ecb1
config_digest: sha256:7231e8e691bad124e4d170998c4b566487a1c61deeffa4a74c5c441138ca640f

building...
digest: sha256:630b2de48a2e51079cec38c002f0c8bc3820b859557963c9fbc79e0d4697ecb1
config_digest: sha256:7231e8e691bad124e4d170998c4b566487a1c61deeffa4a74c5c441138ca640f

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5
33,40d32
<     "chains": [
<       {
<         "layers": [
<           1
<         ],
<         "createdAt": "0001-01-01T00:00:00Z"
<       }
<     ],

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

building...
digest: sha256:9c703d1e490f1e6dfbfde75e6be0a1ee50ea9de7072e10b422c331820b1c50fe
config_digest: sha256:d915739f055f10545f797706423c52f88b5cddd429cb40640b45a03c0f33b8e5

as can be seen, the third build removes the chains object from the base64'd cache. This causes the config_digest to change which then changes the digest. At which build number it breaks seems random to me... This breaks the reproducibility.

Pretty sure this is a bug. Any help is appreciated, thanks :)

skirsten avatar Aug 08 '22 23:08 skirsten

We're running into a similar issue. Images are identical except for these changes to the moby.buildkit.cache.v0 entry. Here's the diff of the base64 decoded entry on both images:

https://gist.github.com/pkwarren/a3f20b1f409d52e66b26ee68011f990c

pkwarren avatar Aug 25 '22 19:08 pkwarren

I updated the title as it seems this is more generic and affects other fields except chains in the cache entry. Due to the random nature of this bug I would hope its simply some kind of race condition (or missing sorting of async outputs) and can be fixed easily.

I unfortunately do not have the time to dig into the code and try to find it myself so any help is appreciated!

skirsten avatar Aug 28 '22 11:08 skirsten

I may have related/same issue:

image build for v2 using --cache-from v1 (previsouly built with inline cache) does not reuse the cache. the next build, v3, using cache from v2, uses the cache. and this repeats every second image.

I found the difference in moby.buildkit.cache.v0:

Image that works as --cache-from has:

{
    "layers": [
      {
        "layer": 12,
        "createdAt": "2022-09-28T08:30:41.439375309Z"
      }
    ],
    "digest": "sha256:13c946961fa6795c59a5ec1f3ce23075ea8bd73056c4b837765a536fa85c7a92",
    "inputs": [
      [
        {
          "link": 6
        }
      ],
      [
        {
          "link": 19
        }
      ]
    ]
  },
  ...

while the next image, that does not work as --cache-from, is missing the layers and have different numbers on the link values:

{
    "digest": "sha256:13c946961fa6795c59a5ec1f3ce23075ea8bd73056c4b837765a536fa85c7a92",
    "inputs": [
      [
        {
          "link": 5
        }
      ],
      [
        {
          "link": 18
        }
      ]
    ]
  },

Diff:

<     "layers": [
<       {
<         "layer": 12,
<         "createdAt": "2022-09-28T08:30:41.439375309Z"
<       }
<     ],
13c7
<           "link": 6
---
>           "link": 5
18c12
<           "link": 19
---
>           "link": 18
28c22
<           "link": 5
---
>           "link": 4
33c27
<           "link": 23
---
>           "link": 22
39,44d32
<     "layers": [
<       {
<         "layer": 5,
<         "createdAt": "2022-09-28T08:30:33.097788126Z"
<       }
<     ],

I was trying to reproduce it, and it happens when the build-args changes only. When build args are the same, the caches can be reused correctly.

DocX avatar Sep 28 '22 10:09 DocX

Your image reproducibility shouldn't depend on whether you are using a cache or not. Instead you should be targeting getting reproducible images with --no-cache, at which point it doesn't matter if you use a cache or not in following builds as the resulting image will be reproducible either way. Buildkit now supports SOURCE_DATE_EPOCH, and by using this with multi-stage builds, COPY --link, RUN find /dir/to/be/copied/into/image -print0 | xargs -0 touch --no-dereference --date="@${SOURCE_DATE_EPOCH}", and pinning your apt/dnf/apk dependencies it should be possible to create FULLY reproducible images that have the same image digest on every build.

ReillyBrogan avatar Sep 15 '23 16:09 ReillyBrogan