buildx icon indicating copy to clipboard operation
buildx copied to clipboard

cache of child imgs gets busted w/o change in `COPY`d file in parent image

Open maxheld83 opened this issue 3 years ago • 0 comments

I've run into what seems like unexpected behaviour (or bug?) which can be observed only under (apparently) very specific circumstances.

As far as I was able to find out in several days of debugging, these conditions all need to be met:

  • [x] There's a "parent" (base) image, which has some COPY foo.txt foo.txt instruction. If you don't COPY (or ADD presumably) anything from the host filesystem, the issue goes away.
  • [x] The "parent" image is built as a contexts = {parent = "target:base"} kind of "dependency" (new-ish feature in buildx) of a "children" (app) image. If you build the the parent image by itself, there's no a cache hit as expected.
  • [x] You cache to/from type=registry. (Local use is not affected).
  • [x] Between docker buildx bake invocations, a new docker buildx instance is created (such as on a new GitHub workflow, but also by re-running the setup buildx action, as in the below reprex). A bunch of other things don't lead to the cache being busted: new checkout, added file, changed mtime of the COPYd file ...

Take any of these away, and the problem goes away.

(Unfortunately, any reasonably complex container-ops CI process which leverages contexts = will probably run into this combination)

Expected behavior:

  • Unless foo.txt is changed, or other instructions in the "parent" image change, it remains the same, and thus "children" image has no reason to bust its cache.

Observed behavior:

As soon as you set up a new docker buildx instance, it's as if somehow, the "parent" had changed, and though it itself is rebuilt from cache, it's children all get busted.

I couldn't think of a way to create a simple reproducible script, b/c this appears to need a live registry to test again, so I created a minimal repo instead.

The offending GH action result is here.

cache

(This is a contrived example; the actual problem I'm facing is, of course, between GHA runs, not within them -- but the repeated docker buildx action provokes the same problem, probably for the same reason).

As you can see from the run times, nothing busts the cache (it's just sleeps) after an initial run -- except for Set up Docker Buildx (again) step.

Weird Stuff:

  • When the parent (base) gets build as a dependency of the child image (app), it's a full cache hit, even after the COPY instruction (without which the whole problem goes away):

    lines 77ff:

    #14 [base 2/4] RUN echo "sleeping in base (parent) ..."; sleep 10; echo "done"    
    #14 CACHED
    
    #15 [base 3/4] COPY docker-bake.hcl docker-bake.hcl
    #15 CACHED
    
    #16 [base 4/4] RUN echo "sleeping in base (parent) after file is COPYed ..."; sleep 10; echo "done"
    #16 CACHED 
    

    So docker buildx seems to be thinking that the parent image (base) is unchanged ... (that's what I'd expect -- the `COPYd file hasn't change, nor anything else).

  • but then downstream, when the child (app) gets build, it's as if the FROM had changed ...

    lines 147ff:

    #21 [app 1/2] RUN echo "sleeping in app (child)..."; sleep 10; echo "done"
    #0 0.052 sleeping in app (child)...
    #21 10.05 done
    #21 DONE 10.8s
    

    I would have expected this sleep to be a cache hit, and not an actual ... sleep.


could be a mobybuild issue and/or related to:

  • #1219

maxheld83 avatar Jul 29 '22 22:07 maxheld83