buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

bug: buildkit does not consider ONBUILD COPY --from

Open astorath opened this issue 6 years ago β€’ 44 comments

Hi, we make heavy use of ONBUILD option in our environment.

Our typical Dockerfile looks like this:

FROM my.company.com/ci/dotnet:v1-build as build
FROM my.company.com/ci/dotnet:v1-runtime

This works fine in legacy docker build, but turning on BUILDKIT option activates some optimizations, so docker tries to build these containers in parallel. The problem is containers are dependent:

  1. my.company.com/ci/dotnet:v1-build:

    FROM base:v1
    WORKDIR /src
    ONBUILD COPY . /src
    ONBUILD make clean all
    
  2. my.company.com/ci/dotnet:v1-runtime:

    FROM runtime:v1
    WORKDIR /app
    ONBUILD COPY --from=build /src/target /app
    

Running DOCKER_BUILDKIT=1 docker build . results in:

DOCKER_BUILDKIT=1 docker build .
[+] Building 0.8s (8/8) FINISHED
...
 => ERROR [stage-2 3/1] COPY --from=build /src/target/ /app 0.0s
------
 > [stage-2 3/1] COPY --from=build /src/target/ /app:
------
 not found: not found

My questions are:

  1. Can I turn off these optimizations to run builds sequentially?
  2. Is there a way to mark these stages as dependent explicitly?

astorath avatar Feb 06 '19 12:02 astorath

This is a very weird (and inventive) way to use COPY --from.

  • Can I turn off these optimizations to run builds sequentially?
  • Is there a way to mark these stages as dependent explicitly?

No, we don't want to add any special behavior for this. Either we consider this as bug and just fix the image resolution or we document that this is invalid usage (and make it error with a proper message).

A problem with implementing this is that we can't start to process all stages in parallel like we do now because we only know about the dependant images after we have pulled the config of the base. Still doable, just adds complexity. Once we have determined the correct dependencies of the stages it would build in regular concurrent manner.

@AkihiroSuda @tiborvass @ijc @thaJeztah get your votes in

tonistiigi avatar Feb 06 '19 23:02 tonistiigi

IIRC ONBUILD was deprecated and unrecommended?

AkihiroSuda avatar Feb 07 '19 09:02 AkihiroSuda

It's not deprecated, but I think the official images stopped creating onbuild variants, due to the behaviour being confusing.

So, IIUC, in buildkit the problem is that the stages are executed in parallel, thus (e.g.)

FROM foo AS ubuntu

FROM something:onbuild AS final

Where something:onbuild has

ONBUILD COPY --from=ubuntu /foo /foo

Could either result in the COPY copying from ubuntu:latest or from the first build-stage (depending on if it's been evaluated first)?

thaJeztah avatar Feb 07 '19 13:02 thaJeztah

@tonistiigi

This is a very weird (and inventive) way to use COPY --from.

Well, this is not my invention, https://engineering.busbud.com/2017/05/21/going-further-docker-multi-stage-builds/ - this is one of the first google search results. So I don't think I'm alone here.

@thaJeztah

It's not deprecated, but I think the official images stopped creating unbuild variants, due to the behaviour being confusing.

Maybe this is confusing for official images, but for multistage internal images designed for that purpose - ONBUILD is a revelation...

Down the link above is a nice solution to the problem: if we could use something like:

ONBUILD FROM runtime:v1

we won't need this strange hack.

astorath avatar Feb 07 '19 13:02 astorath

Quote from https://engineering.busbud.com/2017/05/21/going-further-docker-multi-stage-builds/:

I’m not sure if it’s a bug, a feature, or a undefined behavior

As the author of the multi-stage PR I can confirm I was not clever enough to see it as a possible feature.

tonistiigi avatar Feb 07 '19 19:02 tonistiigi

Just chiming in here - we do the same as @astorath at our company. It's been a godsend for enabling extremely efficient / modular builds and dockerfiles

For context: We discovered this issue after trying to use (new) build secrets (with buildkit)

benvan avatar Oct 16 '19 16:10 benvan

Same problem here.

I cannot reference another stage, or even another image, with ONBUILD COPY --from :(

DavG avatar Feb 04 '20 14:02 DavG

Same problem here.

I cannot reference another stage, or even another image, with ONBUILD COPY --from :(

You can if you add a dummy COPY --from=<previous layer> /dummy /dummy (assuming `/dummy exists in source image). That will add an explicit dependency. It's a workaround, even if inconvenient.

sirlatrom avatar Feb 04 '20 14:02 sirlatrom

I know this trick to force some intermediate stages to build in "weird cases", but I don't see how it is related to the problem here.

DavG avatar Feb 04 '20 14:02 DavG

It even does not work with a external image.

Minimum example to reproduce :

template/Dockerfile

FROM php:7.4.2-fpm-buster
ONBUILD COPY --from=composer:1.9.2 /usr/bin/composer /usr/bin/composer

project/Dockerfile

FROM bug_template

build.sh

DOCKER_BUILDKIT=1 docker build ./template --tag bug_template
DOCKER_BUILDKIT=1 docker build ./project

@sirlatrom maybe i didn't get your trick ? do you see a solution here ?

DavG avatar Feb 04 '20 14:02 DavG

The "trick" is to add a step in the bug image that copies a known-to-exist file from the composer image. That could also be done by creating a prior stage (from the composer image) in the bug image that adds such a dummy file so you can control which it is.

sirlatrom avatar Feb 04 '20 17:02 sirlatrom

Ok, let's try it :

template/Dockerfile

FROM php:7.4.2-fpm-buster
COPY --from=composer:1.9.2 /etc/passwd /etc/passwd_composer
ONBUILD COPY --from=composer:1.9.2 /usr/bin/composer /usr/bin/composer

Same error.

DavG avatar Feb 04 '20 18:02 DavG

I meant the new copy instruction should be placed in the last, most downstream image

sirlatrom avatar Feb 04 '20 18:02 sirlatrom

Could you provide a working fix based on my minimal example ?

DavG avatar Feb 04 '20 18:02 DavG

Could you provide a working fix based on my minimal example ?

Note that the use of images instead of stage names in COPY --from is not documented, so you should use probably stage aliases as shown.

./template/Dockerfile:

FROM php:7.4.2-fpm-buster
ONBUILD COPY --from=template /usr/bin/composer /usr/bin/composer

./project/Dockerfile:

FROM composer:1.9.2 AS template
RUN ["touch", "/tmp/dummy"]

FROM bug_template
COPY --from=template /tmp/dummy /tmp/dummy
DOCKER_BUILDKIT=1 docker build ./template --tag bug_template
DOCKER_BUILDKIT=1 docker build ./project

sirlatrom avatar Feb 05 '20 10:02 sirlatrom

Same error @astorath

tuananh170489 avatar Jul 03 '20 02:07 tuananh170489

Facing the same issue as what @astorath described - we have an ONBUILD build and ONBUILD runtime stages to streamline the Dockerfiles. Lack of this feature in buildkit is big roadblock to adoption for us πŸ™ˆ

EricHripko avatar Aug 19 '20 16:08 EricHripko

Hey folks πŸ‘‹ Docker Desktop 2.4.0.0 enables BuildKit by default now, which will potentially break any Dockerfiles dependent on this functionality. I'm happy to work with maintainers to get this addressed if capacity is an issue here (will possibly need some pointers on where to start though). It looks like @tonistiigi's suggestion could be a way forward:

  • BuildKit starts by pulling the all external FROMs
  • BuildKit checks for stage additional dependencies in ONBUILD
  • BuildKit performs stage builds in parallel where possible (as before, but now accounting for --from)

EricHripko avatar Sep 30 '20 14:09 EricHripko

I ran in to this problem and stumbled upon this issue. What is needed to move this fix forward?

bruth avatar Nov 15 '20 03:11 bruth

Two approaches to implement this:

  • Add conditions in frontend that after ONBUILD is parsed look for this special case. And if new dependencies are detected include them in the build (and process their ONBUILD recursively).
  • Refactor whole frontend to new async features in the LLB client https://github.com/moby/buildkit/pull/1426 (look at follow-ups listed there). This allows adding async dependencies in the graph that are pulled when needed. This is a much more versatile solution and enables other(imho more useful) composition features. But it requires quite a thorough understanding of how LLB is generated from a Dockerfile.

If you are hitting this I strongly also advise you to look if this is actually a correct solution for your problem. Pointing to a non-existant stage is against Dockerfile design where every Dockerfile should be complete and not just a snippet that only works inside another specific Dockerfile. Maybe what you need instead are imports to connect Dockerfiles.

tonistiigi avatar Nov 15 '20 06:11 tonistiigi

This is also a bit of a security issue. If an image that is used in a Dockerfile gets updated to include ONBUILD COPY --from= it could point to an image that user considers private and does not want to leak, without the user having any way to verify that this kind of referencing happens.

tonistiigi avatar Nov 15 '20 06:11 tonistiigi

@tonistiigi Thanks.

If you are hitting this I strongly also advise you to look if this is actually a correct solution for your problem.

Yes I am still evaluating whether this is a necessary approach. However, like the OP, what led me to think it was a bug was that it worked without buildkit.

The use case I have is:

  • Creating a multi-stage Dockerfile for a dev/CI workflow on a per language basis
  • Each stage is built as a standalone image (using --target)
  • The package image refers to the build image in that original Dockerfile using ONBUILD COPY --from=build ...
  • If I run docker build with the following:
FROM my-image-build:latest as build
FROM my-image-package:latest as package

It fails even though (I thought) the ONBUILD COPY --from=build .. should resolve properly since it is identified here. But the way it is resolving, it seems like the --from=.. resolution is not considering this build stage. Whereas the non-buildkit Docker build works.

bruth avatar Nov 15 '20 12:11 bruth

If you are hitting this I strongly also advise you to look if this is actually a correct solution for your problem

I think this solution is born out of the need to create Dockerfiles for a lot of similarly structured projects combined with an attempt to follow best practices. Let's say you have a setup where:

  • build stage takes your source and produces some binaries
  • runtime stage copies/installs the binaries into the production image

This is, of course, trivial to achieve with multi-stage and that's the recommended approach (keeping build and runtime environments separate). However, this doesn't really scale well, as you'd need to repeat this same Dockerfile for 10, 20 or even 50 projects (not uncommon for systems powered by microservices).

Before BuildKit, (arguably) the neatest way to achieve this was with multi-stage ONBUILD files. It enables developers to write extremely modular/reusable Dockerfiles, whilst keeping Docker workflows native (docker/docker-compose build just works).

Given BuildKit feature set, I agree that the solution no longer seems like a great fit. @tonistiigi is the vision to fill this gap with custom BuildKit frontends?

If the path forward is via BuildKit custom frontends, it'd be great to make dockerfile.Build more easily extensible. Creating a custom frontend is quite an undertaking today, as it requires a lot of specialist knowledge (LLB) and boiler plate code (not to mention additional duplication if the end goal is to use Dockerfile syntax). It'd be awesome if there was a simpler API for this to match the learning curve of multi-stage Dockerfiles. For example, what I can imagine doing is:

  • Have a streamlined Dockerfile similar to below:
# syntax = dockerfiles/java-microservice
# Any additional (project-specific) customisations can be specified via familiar Dockerfile syntax
COPY ... 
RUN ... 
  • Have a custom frontend that uses some default pre-baked actions, making the effective Dockerfile look something like this:
FROM jdk as build
COPY . /src
RUN mvn ... # Build the application in a streamlined fasion

FROM jre
COPY --from=build /src/build /bin

# Any additional (project-specific) customisations specified in source
COPY ... 
RUN ... 
  • If there's an easy way to load files from build context, this is even more powerful; You'd now be able to have a generic frontend that can combine Java/Python/etc functionality in a single streamlined entrypoint:
# syntax = dockerfiles/microservice
# Any additional (project-specific) customisations can be specified via familiar Dockerfile syntax
COPY ... 
RUN ... 

What are you thoughts on this?

EricHripko avatar Nov 17 '20 13:11 EricHripko

I've written up a blog post to document some alternatives to ONBUILD COPY --from. These vary in flexibility (and, as a result, complexity) but are all compatible with BuildKit πŸŽ‰ Would love feedback from folks here on whether this works for their use cases and if it could be improved πŸ™‚

EricHripko avatar May 18 '21 20:05 EricHripko

Is there any update on a solution for this? I've just discovered this limitation while building out some reusable images for a large project. I have multi-stage images which declare instructions such as ONBUILD COPY --from=other-image /foo /bar and also use BuiltKit secrets mechanism in ONBUILD instructions: ONBUILD RUN --mount=type=secret,id=my-secret,uid=101 source /run/secrets/my-secret && install-dependencies. The instructions are not being run in downstream image builds, which is a serious limitation. If it worked it would allow me to reduce the amount of boiler plate and have a set of reusable layers for application images across a large project.

EDIT: As I wrote the above I realized that I only use BuildKit for the secrets mechanism. If I can find an alternative I may drop BuildKit due to this limitation.

boonware avatar Nov 29 '21 14:11 boonware

Is there any update on a solution for this? I've just discovered this limitation while building out some reusable images for a large project. I have multi-stage images which declare instructions such as ONBUILD COPY --from=other-image /foo /bar and also use BuiltKit secrets mechanism in ONBUILD instructions: ONBUILD RUN --mount=type=secret,id=my-secret,uid=101 source /run/secrets/my-secret && install-dependencies. The instructions are not being run in downstream image builds, which is a serious limitation. If it worked it would allow me to reduce the amount of boiler plate and have a set of reusable layers for application images across a large project.

EDIT: As I wrote the above I realized that I only use BuildKit for the secrets mechanism. If I can find an alternative I may drop BuildKit due to this limitation.

Here is the switch https://github.com/companieshouse/ch.gov.uk/pull/192

Just set environment arg DOCKER_BUILDKIT=0 before docker build

softworm avatar Jan 10 '22 02:01 softworm

@softworm This does not solve the problem. Turning off BuiltKit means you cannot use the secrets mechanism, which is the only decent approach for passing secrets into a build.

boonware avatar Jan 10 '22 12:01 boonware

So this is a bug right? I have a build image

ONBUILD COPY Gemfile* /application/
ONBUILD RUN bundle install

and then a multi-stage build, however if I change the contents of the Gemfile, it won't pick it up

FROM build-image:foo as builder

FROM base-image

I can confirm that running a build with buildkit off, it works and will pick up the changes

c-ameron avatar Apr 26 '22 12:04 c-ameron

It also looks like someone else is having this problem https://stackoverflow.com/questions/66952378/docker-multistage-onbuild-not-executing-first-stage-onbuild-commands

c-ameron avatar Apr 26 '22 12:04 c-ameron

Just for the sake of adding another usage example of this (now missing) feature: https://github.com/r2d2bzh/docker-build-nodejs

This project started internally 3 years ago. At this time it was designed this way after reading the article pointed by @astorath in a previous comment and it also closely relates to the situation described by @EricHripko.

Many of us are still not using buildkit. This means that, for now, we have to support both the pre-buildkit and post-buildkit environments. The only way to do that ATM is to provide two different build methods, one for non-buildkit users and another for buildkit users.

What is also clearly odd for someone using the docker compose plugin is that these errors start popping between versions 2.3.3 and 2.3.4 of the plugin. It took me at least an hour to relate these behaviors to this issue, this is OK but perhaps having a warning on this particular subject in the Dockerfile reference (or somewhere else) might be helpful to spot or avoid the issue more easily. I can help documenting this, provided the proper guidance.

gautaz avatar May 04 '22 17:05 gautaz