buildkit Design: linking frontends

Design: linking frontends

Open vlad-ivanov-name opened this issue 1 year ago • 8 comments

This issue is for discussing design of "linking frontends" feature.

Original discussion can be found here

Problem

As builds get more complicated, Dockerfiles get longer and harder to maintain. It is easier to put separate sections to separate Dockerfiles, but that currently requires multiple builds.

Instead of every project building a dependency directly inside the project's Dockerfile with lots of duplication, it would be better to use the upstream Dockerfile directly.

Buildkit supports flexible frontends, but currently you need to pick one frontend and can't use a result from one frontend inside another.

Proposed solution

When stage (FROM command) points to a path, git repository, or an https URL, a nested build is performed in that location, and the stage name can retrieve the result. A subselection can be performed with stage.selector that would retrieve a specific target from the subbuild.

Paths from context need to start with ./.

FROM ./myapp2.Dockerfile AS app2

FROM alpine
COPY --from=app2 / /bin/

FROM git://github.com/opencontainers/runc.git#master AS runc

FROM alpine
COPY --from=runc /usr/bin/runc /bin/

FROM https://gist.github.com/foo/utils.Dockerfile AS deps

FROM deps.foo
# equivalent to `docker build --target=foo -f - . < https://gist.github.com/foo/utils.Dockerfile

Imported Dockerfiles must be full functional Dockerfiles. You can not import a snippet of commands.

Context for the build is:

for path, same context as main build
for git, the git repository files
for https, empty

When build arguments values are passed to nested builds their values need to be scoped with the stage name, eg. docker build --build-arg runc.buildtags="apparmor" ..

Advanced form

For advanced cases where you would want to use a custom frontend not defined with #syntax, provide more inputs than just build context etc. there is a special "advanced form" nested build.

FROM @ AS foo
ENV foo=bar
ARG BUILDKIT_FRONTEND=dockerfile.v0
ARG BUILDKIT_INPUT_DOCKERFILE=dockerfile
ARG BUILDKIT_INPUT_CONTEXT=context
ARG BUILDKIT_INPUT_EXTRA=stage1

This form allows defining the full set of parameters and inputs that buildkit accepts for invoking frontends.

Aug 24 '22 08:08 vlad-ivanov-name

A feature that I think we really want to support is not just mixed-versions of a specific frontend (like dockerfile v1.3 and dockerfile v1.4), but allowing composing multiple different frontends together. For example, imagine a frontend who's "Dockerfile" was simply a list of apt packages to install into a debian image (just as an example, this isn't a seriously suggested syntax):

git
build-essential
clang
llvm

Then, we might want to take the image that is built by this frontend and use it in a Dockerfile:

RUN git clone ...
WORKDIR ...
RUN make build && make install

Using the syntax suggested, this would look like:

deps.Dockerfile

#syntax=foo/apt:ubuntu
git
build-essential
clang
llvm

Dockerfile

#syntax=docker/dockerfile:1
FROM ./deps.Dockerfile
RUN git clone ...
WORKDIR ...
RUN make build && make install

I also wonder if we might want to allow a syntax to potentially allow mixing multiple syntaxes into a single Dockerfile, possibly with multiple syntax directives. I think this would be a pretty useful feature, to prevent users from needing to split up everything - which would be super useful if we get a collection of "utility" frontends - for example, I can imagine frontends that might do build post-processing, or perform image slimming as part of the build, or automatically collect attestations from the filesystem to add to the image, etc. Here I've used ^ in this case to represent the previous "stage" (it's not a dockerfile stage). If we introduced a way of naming these "meta-stages", then we could potentially try and merge them with the existing dockerfile stages to ensure that we get a consistent look-and-feel?

Dockerfile

#syntax=foo/apt:ubuntu
git
build-essential
clang
llvm

#syntax=docker/dockerfile:1
FROM ^
RUN git clone ...
WORKDIR ...
RUN make build && make install

If we did go with this splitting approach based on directives, then we could introduce this by creating a new "meta" frontend, whose sole purpose is to do the syntax detection, split up the Dockerfile to forward to various frontends, and to tidy up errors and warnings with their line number - currently a lot of this logic is in the dockerfile frontend, but we could move it out to another location, and then set this new meta frontend to be the new default.

Aug 24 '22 08:08 jedevc

Real-life usecase for multiple Dockerfiles: imagine a set of applications using different ML frameworks, environments for testing and building those applications have differences but use shared base dependencies. Dockerfile dependency graph would look something like this:

graph BT
    A[tensorflow.Dockerfile] --> B[ubuntu-ci-base.Dockerfile]
    C[pytorch.Dockerfile] --> B
    D[library-a.Dockefile] --> C
    F[library-b.Dockefile] --> C
    E[integration-tests.Dockerfile] --> D

Aug 24 '22 09:08 vlad-ivanov-name

@jedevc

imagine a frontend who's "Dockerfile" was simply a list of apt packages to install into a debian image

How would this be combined with this requirement:

Imported Dockerfiles must be full functional Dockerfiles. You can not import a snippet of commands.

Would that mean you can only call this apt frontend once, meaning all following apt calls must be defined as normal RUN commands?

Aug 24 '22 09:08 vlad-ivanov-name

I think we'd definitely want to support multiple source Dockerfiles, the merged syntaxes in a single file is more of an extension and suggestion for something I think we should do.

Imported Dockerfiles must be full functional Dockerfiles. You can not import a snippet of commands.

I think we should drop this requirement, or at least rephrase it. I don't think we should implement this at the Dockerfile level, but at a conceptual level above that - we shouldn't just allow connecting dockerfiles together, we want to allow connecting arbitrary frontends of different versions/syntaxes/etc. I think the word "source file" is probably a better word, since Dockerfile implies that the syntax is the Dockerfile syntax, but we should allow mix and matching syntax.

I think I'm imagining something very similar to the concept of stages - unfortunately, at the moment, these are specific to Dockerfiles, and not exposed at the gateway interface. We could potentially rework some of the API to allow exposing a target, which would then let stages be combined across different frontends.

Aug 24 '22 09:08 jedevc

Maybe we could try to imagine what would the syntax or workflow look like from the user side, and then see how that fits in the backend? So in an example with apt frontend and a top-level Dockerfile, how would you call this frontend twice? Something like being able to reference "self" as base?

FROM ubuntu:focal

# commands

FROM deps-1.Dockerfile WITH @

# commands

FROM deps-2.Dockerfile WITH @

I could also see passing stage name to a frontend but it would force users to invent stage names where they don't really need them

Aug 24 '22 09:08 vlad-ivanov-name

Imported Dockerfiles must be full functional Dockerfiles. You can not import a snippet of commands.

I think we should drop this requirement, or at least rephrase it.

This requirement means that you can not have a Dockerfile that just has

RUN foo
RUN bar

And then include it in

FROM alpine
FROM ./other-dockerfile

The imported dockerfile is evaluated on its own, not that commands from that dockerfile are copied into the other one. Agree that this is not Dockerfile specific at all but works on frontend level.

WITH @

I'm not quite sure what this means.

Advanced form

This looks quite weird and maybe should be dropped from initial work. But it shows one of the problems. When Dockerfile gets executed, it has a build context and additional contexts with --build-context. When loading another Dockerfile, what is the context for it? I think good default is that it has the same main context, but that may not be correct for all the cases.

For example, if you compare this with the linked dockerfiles with docker buildx bake today https://www.docker.com/blog/dockerfiles-now-support-multiple-build-contexts/, then these cases can have multiple Dockerfile with their own source files and build arguments that are then merged together into a single build.

One of the alternatives would be to make Dockerfile frontend understand the bake definition as well. So you could use a result of a bake target inside Dockerfile (still keeping the security sandbox of course).

Aug 25 '22 02:08 tonistiigi

I'm not quite sure what this means.

just trying to come up with something based on the example apt frontend @jedevc mentioned -- I wanted to show that a setup where you can only call apt once has limited usefulness. WITH @ means use current "intermediate" container as base; the apt frontend would then use this container as source to build on top. Of course, it doesn't mean the Dockerfile of apt frontend is not independent; this proposed syntax is just a shortcut for:

main.Dockerfile

FROM ubuntu as stage1
ADD file

frontend.Dockerfile

ARG BASE_IMAGE
FROM ${BASE_IMAGE}

Normally one could build stage1, tag it and pass it as input to another Dockerfile. Having a shortcut here would be nice.

This looks quite weird and maybe should be dropped from initial work.

agree

Aug 25 '22 07:08 vlad-ivanov-name

When loading another Dockerfile, what is the context for it? I think good default is that it has the same main context, but that may not be correct for all the cases.

I assume keeping main context would be the easiest thing to implement. As long as there is a way to provide some sort of arguments to the other Dockerfile, users would be able to just specify and use additional contexts with --build-context

Aug 25 '22 07:08 vlad-ivanov-name

buildkit buildkit copied to clipboard

Design: linking frontends

Problem

Proposed solution

Advanced form

buildkit
buildkit copied to clipboard