docker-tools icon indicating copy to clipboard operation
docker-tools copied to clipboard

Image caching logic doesn't take into account dependencies from intermediate stages

Open mthalman opened this issue 1 year ago • 4 comments

There is a scenario where the image caching logic incorrectly skips building a Dockerfile when that Dockerfile has an intermediate stage based on an image that has a cache miss. An example of this scenario is the this build (internal link). That build was triggered from this PR: https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1207. In that PR, a change was made only to the crossdeps-builder Dockerfile. The build only ended up publishing that crossdeps-builder image and none of the images that are dependent on it.

Here is an example Dockerfile with such a dependency: https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/642ddfa47486760a15781758c7e8c0709c4c96c0/src/azurelinux/3.0/net9.0/cross/amd64/Dockerfile#L3. The image caching logic only accounts for the dependency of the final image. In this case, the final image is based on crossdeps-llvm, which has not changed. So the cross Dockerfile never gets built. But it needs to be built because the content it contains is copied from crossdeps-builder, which has changed.

mthalman avatar Oct 02 '24 18:10 mthalman

Instead of adding lots of complexity to our custom caching mechanism, maybe it's time to evaluate using Docker's built-in caching. There have been lots of changes since BuildKit became the default. If we store the cache on the registry, Docker should cache most or all layers using the normal cache invalidation rules, which includes base image changes and individual Dockerfile instructions. If we use the "max" cache option, then it'll also store intermediate stages/layers on the registry.

lbussell avatar Oct 02 '24 18:10 lbussell

Instead of adding lots of complexity to our custom caching mechanism, maybe it's time to evaluate using Docker's built-in caching.

Maybe. But that defeats the whole purpose of https://github.com/dotnet/docker-tools/pull/1449. 🤷‍♂️

mthalman avatar Oct 02 '24 18:10 mthalman

[Triage] One way to fix this would be to include all FROM images in the Dockerfile in the image-info file on https://github.com/dotnet/versions. Then we could re-build when any of those images change. However, this is not a small amount of work for this one scenario that really only affects the buildtools-prereqs repo. Therefore, we should close this as not planned for now, and consider re-opening in the future if it becomes a bigger issue.

lbussell avatar Oct 07 '24 18:10 lbussell

[Triage] This would still be high-cost to fix, still only applies to the buildtools prereqs repo, and still has valid workarounds. A lower-cost solution would be to write a guide/workflow on the buildtools-prereqs repo for handling these scenarios.

To validate these scenarios in PRs, it would also be good to add a no-cache build leg to that repo. Special care should be taken to avoid spending too much compute time with this.

lbussell avatar Jan 13 '25 19:01 lbussell

This has been on the sprint column for 5 months. I am moving it back to current release. @mthalman I see this is on two boards at once. Should it be removed from the .NET Docker board?

lbussell avatar Jun 11 '25 00:06 lbussell

Yeah, I think it's fine to remove it since it only applies to the buildtools repo.

mthalman avatar Jun 11 '25 14:06 mthalman

This continues to be a problem - the latest images produced from https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1516 don't have clang 21.1.3. (The crossdeps-builder images do, but the dependent images like crossdeps-llvm-amd64, cross-amd64, etc don't as described in the issue above.) I don't think it's an acceptable solution to require every version update in crossdeps-builder to touch the dependent dockerfiles. Could we reconsider addressing this? @richlander

sbomer avatar Oct 16 '25 17:10 sbomer

If I were going to fix this today, I'd probably add a new FromImages property to PlatformData alongside what already exists.

https://github.com/dotnet/docker-tools/blob/46e34e7be617899ad8e243a66ccf60c03786adac/src/ImageBuilder/Models/Image/PlatformData.cs#L35-L36

You would need to add more logic to BuildCommand.cs to fill out the FromImages property in PlatformData. I would probably add it to SetPlatformDataBaseDigest since it already does something similar:

https://github.com/dotnet/docker-tools/blob/46e34e7be617899ad8e243a66ccf60c03786adac/src/ImageBuilder/Commands/BuildCommand.cs#L257-L285

Then, we would need to update ImageCacheService.cs to invalidate the cache when any of the FROM images were changed, not just the base image digest.

This would also probably get us some rudimentary automated re-builds for FROM scratch images. Related: https://github.com/dotnet/dotnet-docker/issues/1455

lbussell avatar Oct 16 '25 17:10 lbussell