cache of child imgs gets busted w/o change in `COPY`d file in parent image
I've run into what seems like unexpected behaviour (or bug?) which can be observed only under (apparently) very specific circumstances.
As far as I was able to find out in several days of debugging, these conditions all need to be met:
- [x] There's a "parent" (
base) image, which has someCOPY foo.txt foo.txtinstruction. If you don'tCOPY(orADDpresumably) anything from the host filesystem, the issue goes away. - [x] The "parent" image is built as a
contexts = {parent = "target:base"}kind of "dependency" (new-ish feature in buildx) of a "children" (app) image. If you build the the parent image by itself, there's no a cache hit as expected. - [x] You cache to/from
type=registry. (Local use is not affected). - [x] Between
docker buildx bakeinvocations, a new docker buildx instance is created (such as on a new GitHub workflow, but also by re-running the setup buildx action, as in the below reprex). A bunch of other things don't lead to the cache being busted: new checkout, added file, changed mtime of theCOPYd file ...
Take any of these away, and the problem goes away.
(Unfortunately, any reasonably complex container-ops CI process which leverages contexts = will probably run into this combination)
Expected behavior:
- Unless
foo.txtis changed, or other instructions in the "parent" image change, it remains the same, and thus "children" image has no reason to bust its cache.
Observed behavior:
As soon as you set up a new docker buildx instance, it's as if somehow, the "parent" had changed, and though it itself is rebuilt from cache, it's children all get busted.
I couldn't think of a way to create a simple reproducible script, b/c this appears to need a live registry to test again, so I created a minimal repo instead.
The offending GH action result is here.
(This is a contrived example; the actual problem I'm facing is, of course, between GHA runs, not within them -- but the repeated docker buildx action provokes the same problem, probably for the same reason).
As you can see from the run times, nothing busts the cache (it's just sleeps) after an initial run -- except for Set up Docker Buildx (again) step.
Weird Stuff:
-
When the parent (
base) gets build as a dependency of the child image (app), it's a full cache hit, even after theCOPYinstruction (without which the whole problem goes away):#14 [base 2/4] RUN echo "sleeping in base (parent) ..."; sleep 10; echo "done" #14 CACHED #15 [base 3/4] COPY docker-bake.hcl docker-bake.hcl #15 CACHED #16 [base 4/4] RUN echo "sleeping in base (parent) after file is COPYed ..."; sleep 10; echo "done" #16 CACHEDSo
docker buildxseems to be thinking that the parent image (base) is unchanged ... (that's what I'd expect -- the `COPYd file hasn't change, nor anything else). -
but then downstream, when the child (
app) gets build, it's as if theFROMhad changed ...#21 [app 1/2] RUN echo "sleeping in app (child)..."; sleep 10; echo "done" #0 0.052 sleeping in app (child)... #21 10.05 done #21 DONE 10.8sI would have expected this sleep to be a cache hit, and not an actual ...
sleep.
could be a mobybuild issue and/or related to:
- #1219