RFE: implement switching roots and sub-images (aka nested builds)

Open muayyad-alsadi opened this issue 9 years ago • 8 comments

as per our discussion in the links below the proposed change is

# the same tag passed to docker build
SWITCH_ROOT <other_image> <new_root> ...
# add -<tag-suffix> to the tag up to :
TAGGED_SWITCH_ROOT <tag-suffix> <other_image> <new_root> ...

I'm not sure about SWITCH_ROOT_N_PUBLISH or SWITCH_ROOT_W_TAG or TAGGED_SWITCH_ROOT

but the idea that the build root is kept, if no SWITCH_ROOT it will be used and tagged, if there are at least one SWITCH_ROOT the build root is kept a side and a new image is used as root for the tag passed, for TAGGED_SWITCH_ROOT which can be used multiple times, new_root are still with respect to build image which is not yet discarded.

RUN make install SERVER_PREFIX=/dest/server CLIENT_PREFIX=/dest/client
RUN cp /bin/server-extra /dest/server/bin/
RUN cp /bin/client-extra /dest/client/bin/
SWITCH_ROOT busybox /dest/server
TAGGED_SWITCH_ROOT client busybox /dest/client

using the above with docker build -t foobar:2.5 would result two images foobar:2.5 and foobar-client:2.5

we even can support a special <other-image> that start with self

RUN make install SERVER_PREFIX=/dest/server CLIENT_PREFIX=/dest/client
RUN mkdir -p /dest/server2/bin/ && cp /bin/server-monitor /dest/server2/bin/
RUN cp /bin/client-extra /dest/client/bin/
SWITCH_ROOT busybox /dest/server
TAGGED_SWITCH_ROOT slave self /dest/server 
TAGGED_SWITCH_ROOT slave-monitor self-slave /dest/server2
TAGGED_SWITCH_ROOT client busybox /dest/client

the idea originates from RPM SPEC-file here is an example http://pkgs.fedoraproject.org/cgit/mariadb.git/tree/mariadb.spec

# for mariadb
%files
# for mariadb-libs
%files libs
# for mariadb-config
%files config
# for mariadb-common
%files common
# for mariadb-server
%files server

https://github.com/docker/docker/issues/7115#issuecomment-133548130 https://github.com/docker/docker/issues/15271 https://github.com/docker/docker/issues/14298#issuecomment-129616725

Aug 21 '15 20:08 muayyad-alsadi

Hi @muayyad-alsadi. Thanks for opening this issue.

I'm not a fan of having a build publish multiple images - I think there should always be only one result at the end. I'm also not a fan of having a build instruction that sets an image name or tag - that should be done separately after the build and is easier to reason about when there is only a single resulting image.

While I really like the SWITCH_ROOT/PIVOT_ROOT concept I thing it would be better to split it into separate commands that give more flexibility over what's in the resulting image: another FROM <image> instruction to begin a new build with the given base image using the previous container root filesystem as the context. The SWITCH_ROOT behavior would then be this followed by a COPY instruction. We would then only need to handle context switching and copying between containers (probably just through the client for now) in order for to get this to work.

Aug 24 '15 17:08 jlhawn

I'm not a fan of having a build publish multiple images

me neither. I don't like starting from a single dockerfile and ending with multiple images. but there is a real-world use case. A monster code-base that result in so many sub-images (as in sub-packages).

for example in ubuntu a single source for libreoffice builds more than 150 binary deb packages.

although this is too much and less likely to be the most common use-case docker but still it's a real world example.

a more realistic example would be a common code-base for both server and client or for server and its replication daemon ..etc.

since dockerfile is a single file, it's more like specfile in rpm world than debian's directory rules. that's why I think we should look at it for inspirations (it took decades to become what it's now and it's part of linux standard base), here I'm talking about %package client which would build base-client.rpm so TAGGED_SWITCH_ROOT client busybox /dest/client would result in an image called base-client (where base is what was passed by the user)

I'm not going to use multiple images but I'm sure someone would need it, it's a valid use case. and it won't cost us any thing to implement.

to make it clean TAGGED_SWITCH_ROOT is not followed by full [REPOSITORY[:TAG]] but rather the suffix to be added to the base tag

While I really like the SWITCH_ROOT/PIVOT_ROOT concept I thing it would be better to split it into separate commands that give more flexibility over what's in the resulting image: another FROM instruction to begin a new build with the given base image using the previous container root filesystem as the context. The SWITCH_ROOT behavior would then be this followed by a COPY instruction. We would then only need to handle context switching and copying between containers (probably just through the client for now) in order for to get this to work.

exactly we would have to start with build container = working container and do all the build/commit in the build container, until we face SWITCH_ROOT, in that case build container will remain the old one but working container will point to a new one with fresh FROM (scratch, busybox, ...etc) then copy the following arguments to SWITCH_ROOT from the build container (which is available to us using CP).

why not use COPY? well, you need to know that we have 3 locations:

host (directory with dockerfile)
build container (the one before pivot)
working/destination container

I don't think overloading COPY and ADD commands is right (COPY HOST:/src/ /dest and COPY BUILD:/src/ /dest)

supporting multiple builds (not really nested) is trivial as there is always one single working/destination container and one single fixed build container. and all paths after SWITCH_ROOT refer to the build original build container.

each SWITCH_ROOT or TAGGED_SWITCH_ROOT will do docker commit to apply the tag and at the end of the file we do the final commit and tag.

SWITCH_ROOT is only allowed once, while TAGGED_SWITCH_ROOT can be specified multiple times.

both SWITCH_ROOT and TAGGED_SWITCH_ROOT do not inherit any thing from build image except maintainer in other words exposed ports, volumes, cmd, entry point and current directory ..etc are not inherited from build image (they are reset to that of the pivot image and the only exception is the maintainer)

exposed ports, volumes, cmd, entry point are not allowed before SWITCH_ROOT or TAGGED_SWITCH_ROOT

Aug 24 '15 17:08 muayyad-alsadi

but there is a real-world use case. ... for example in ubuntu a single source for libreoffice builds more than 150 binary deb packages. ... a more realistic example would be a common code-base for both server and client or for server and its replication daemon ..etc. ... I'm not going to use multiple images but I'm sure someone would need it, it's a valid use case. and it won't cost us any thing to implement.

Container image building is not and should not be a solution to everything. A Dockerfile is used to produce a single container image. If you need multiple container images you can have several different Dockerfiles and even have common base images if you want to (this is a recommended pattern). If you do have a "monster code-base" with several sub directories that each produce their own container image (which a lot of people do have - including myself) then you can have a separate Dockerfile for each container image you need to build. If you want to automate the process of building all of these images together then you can use a variety of other tools to accomplish that task (many people use Makefiles for this).

I don't think overloading COPY and ADD commands is right (COPY HOST:/src/ /dest and COPY BUILD:/src/ /dest)

I wasn't proposing overloading them, they would behave the same, it's just the context which changes. The source path is always relative to the context directory (which can be local or in a container) and the destination is always in a container.

Aug 24 '15 18:08 jlhawn

If you do have a "monster code-base" with several sub directories that each produce their own container image (which a lot of people do have - including myself) then you can have a separate Dockerfile for each container image you need to build. If you want to automate the process of building all of these images together then you can use a variety of other tools to accomplish that task (many people use Makefiles for this).

no, it's not just about automating or scripting this with Makefiles. it's a completely different use case.

not just a different images from same code base, but from the same makefile and same build, take a look at Fedora's MariaDB SPEC-file, the lines below will compile both mariadb server and mariadb client tools

%cmake . \
         -DCMAKE_INSTALL_PREFIX="%{_prefix}" \
# ....
make %{?_smp_mflags} VERBOSE=1
# ...
make DESTDIR=%{buildroot} install

if you want to create to separated images for mariadb server and client one need to either to compile it twice for each image (ie. compile every thing in first pass then discard every thing and keep the server, then compile everything again, then discard every thing except client) or to use the proposed method

let's get back to the LibreOffice/OpenOffice example (which has a server by the way, and it's used by many other services like BigBlueButton). Last time I compiled OpenOffice it took more than 8 hours (since fedora has a policy not to include any pre-built thing including jars). Having 150 Dockerfiles each takes 8 hours to build.

go projects takes no time to compile maybe this is why you don't see why some people might need this. but there are some elephantine legacy projects that takes forever to compile and repeating this for every component is not a good idea and putting them in a single container in many cases is not the docker way (single process per container, although mariadb example was ok, because only server would be running)

sometimes a project requires you to build all components at once because there are some compile-time mapping for plugins ..etc.

currently what people do, is to build outside docker at host or downloaded package files rpm, deb or use ubuntu PPA ..etc.) then use ADD/COPY to put them.

I wasn't proposing overloading them, they would behave the same, it's just the context which changes.

after pivot we have COPY conf/server.conf /etc/foo.conf does this mean copy conf/server.conf from build container or from current directory on host.

sorry for the long post, it's a matter of taste and I'm not trying to change your mind, I'm just trying to give you vision so that you make your mind while knowing all aspects.

Aug 24 '15 21:08 muayyad-alsadi

... go projects takes no time to compile maybe this is why you don't see why some people might need this. but there are some elephantine legacy projects that takes forever to compile and repeating this for every component is not a good idea and putting them in a single container in many cases is not the docker way (single process per container, although mariadb example was ok, because only server would be running) ...

The fast go compiler is nice to have, but I'm very familiar with the problem of long compile times for big projects. I used to work on WebKit which sometimes took almost a couple of hours to compile from scratch (depending on build flags and how many cpu cores you had dedicated to it - it typically took 20-30 minutes on the beefiest build machines with 12 cores and 24GB of memory). I don't expect every containerized component of LibreOffice be build from scratch and take several hours each. There's always been a way to compose your builds to try to optimize for not rebuilding things. You can do it with multiple Dockerfiles and sharing base images. I'll try an example:

Say you have a client/server application that you want to build container images for, the client and server each having separate images that you can spawn containers from to use independently. You're client and server are both from a pretty big C/C++ codebase and because you're a great code architect you make sure that the client and server share as much code as possible. When you build both the client and server locally it takes about an hour, and you want your resulting images to not have the source code in them either, only the binaries.

The first thing you would do is write a Dockerfile which copies in all of the source code and compiles the shared libraries only. You might want to call this "my-app-core" or something similar. This image on its own isn't very useful. It contains all of your codebase and dependencies and the compiled shared object code but doesn't have either your client or server binary yet.

Next you'll want to build the client and server binaries separately. You'll write 2 more Dockerfiles: one for the client and one for the server. Starting with the client, you write a Dockerfile which is FROM my-app-core and then write more instructions to continue compiling the client binary. These steps are relatively fast because the base image "my-app-core" already contained a lot of the compiled code that goes into the client. The rest of the Dockerfile drops down to a nested build and copies the client binary into a new empty filesystem.

The Dockerfile for the server would be similar: FROM my-app-core, followed by the steps to compile the server binary which is boosted by the fact that so much of it was pre-built in the base image, then a nested build that creates a final image which contains only the server binary and any other small files you might need.

You'll probably find that building your images this way is much faster than if you have built the client and sever separately from scratch. The total build time is reduced by the fact that you used the common core base image which contained the pre-built shared libraries and you only had to build it once. Depending on how you do it, changes to only either the client or server code requiring rebuilds may be even faster because you can take advantage of build cache and never have to rebuild the common core base image.

When you want to publish your application to your users you only need to release the resulting minimal client and server images and not the bloated intermediate "my-app-core" image.

Aug 24 '15 22:08 jlhawn

I actually do something similar, I currently have a system that uses fabric/ansible and my base image has supervisord and openssh disabled. my build script uses docker exec to start sshd then then inject host public key, then run fabric/ansible build, then start another base docker images then scp or pipe tars between containers then disable sshd and cleanup then commit.

docker exec -i container1 bash -e "cd /dest && tar -cf - ." | docker exec -i container2 tar -xvf -

people in atomic project has their own integrated builder called reactor (integrates with koji the cloud builder in fedora infrastructure that compiles all fedora/EPEL rpms and push them ..etc.)

https://github.com/projectatomic/atomic-reactor

but I believe a small simple TAGGED_SWITCH_ROOT would eliminate all of those things and make them much more simpler.

Aug 28 '15 12:08 muayyad-alsadi

Hi, everyone. We found that being able to have multiple FROM instructions in a single Dockerfile is very convenient. Take a look at https://github.com/grammarly/rocker

Sep 08 '15 19:09 ybogdanov

Regarding rocker with multiple from like this one

FROM google/golang:1.4 MOUNT .:/src WORKDIR /src RUN CGO_ENABLED=0 go build -a -installsuffix cgo -v -o rocker.o rocker.go

run image

FROM busybox ADD rocker.o /bin/rocker

First it's a nice thing that just work and accomplish the job

But I have concerns: It opens the gates too wide, people would tend to do crazy things on the host and attach it to the container. For example they might inject setuid binary in the mounted volume.

My objective is to leave the host alone. You should not depend on what is installed on it nor you pollute the host. The host status after the build should be the same as before it.

On Tue, Sep 8, 2015, 10:52 PM Yuriy Bogdanov [email protected] wrote:

Hi, everyone. We found that being able to have multiple FROM instructions in a single Dockerfile is very convenient. Take a look at https://github.com/grammarly/rocker

— Reply to this email directly or view it on GitHub https://github.com/jlhawn/dockramp/issues/1#issuecomment-138681751.

Sep 08 '15 20:09 muayyad-alsadi

dockramp dockramp copied to clipboard

RFE: implement switching roots and sub-images (aka nested builds)

run image

dockramp
dockramp copied to clipboard