docker-stacks
docker-stacks copied to clipboard
Multi stage docker builds - reduce file size?
@dhirschfeld previously wrote the following in the gitter chat of jupyterlab, and I'd love to learn more about it!
@consideRatio - I use docker multi-stage builds, build the extensions in a throwaway container then copy the lab folder into my final image. This has the benefit of ensuring you don't end up with any npm cache stuff in your final image and minimises layers.
Background: What is a multi stage docker build?
What if you needed 100 tons of tools required for building your source code, but later they would not be utilized? Well, you could use an image with these tools and build source code, then from a fresh slim image add the compiled binary only. This is what the following example does, for more details see this documentation.
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
@dhirschfeld do you happen to have an estimate of the effect of this on the image sizes?
This could turn out to be a significant performance improvement of the docker stack images, or not, I don't know! But, if we manage to reduce the file size of these docker images we would reduce the time it takes to start a new fresh server as often done in cloud environments, something that not only improves user experience but would often also reduce costs.
Related: https://github.com/jupyterlab/jupyterlab/issues/4930#issuecomment-406448725
Definitely worth looking into if someone wants to give it a shot. I know we do some amount of cleanup along the way through the build (apt-get, node, conda) but perhaps there's room for improvement.
I have been working on a multi-stage variant of the base-notebook container lately for a separate project. My use-case is a bit different -- I'm trying to make it easier to have both an Ubuntu and a CentOS-based version of a base Jupyter container -- but the Ubuntu version is essentially the same as your base-notebook rearranged to support multiple stage building. In my initial testing, it hasn't made a size difference because of the cleanup you're already doing, but there may be some more tuning I could do. What is the best way of providing examples to you for feedback? I'd be happy to try to generate a PR or two, even if they are rejected, just to further the discussion.
@echowhisky thanks for sharing what you're working on. I think opening a pull request is a fine thing to do so that folks interested in the topic can see and discuss the changes you're making.
I'd support the above. On our project we had a similar issue with image size increasing, and also adopted the multi-stage build (since we built clean from git, with tools) as one part of the approach.
I just came here after looking at using docker-compose - with kafka, zookeeper, our project (Egeria - just the core), and it was the notebook image that stood out in size dramatically at around 2.7GB (from docker image). We used it as part of a docker-compose setup for tutorial (for 'proper' deployment k8s is more typical)
Of course I chose the image to benefit from reuse, flexibility, simplicity. It does include more than just notebook of course, but on MacOS my homebrew installed jupyter is only 67MB so it's quite a big increase :-( Python, ipython add another 165MB or so.
If I get a chance I'll try and read through 858 in more detail - I've never looked at notebook before so perhaps it is just rather tough to slim down.
Just for your information. My largest images, which I currently have to offer to my businesses, is now almost 9 GB. I use data science image and because jupyterlab plugins take also quite a lot of space, the image has grown quite a bit. Well, those enterprises... At my "scale", I will need to create a new image, trying to make it very slim.
@consideRatio @parente @echowhisky @planetf1 @dmpe I haven't found any places, where we could use multi-stage docker builds.
- We clean after apt and only try to install packages when necessary.
- We don't install any build-related packages anymore (like gcc).
- We do not use
pip install, but always usemamba, that's why we don't build anything.
Please, tell me if I missed something, but I don't think we can easily apply this to reduce our image sizes.
I think we can remove wget and bzip2 when downloading and extracting micromamba for example, but this will make it a bit more difficult to read and will only save a few megabytes.
@mathbunnyru :+1: this is not so applicable to us overall. It is more relevant when you have an image soo trimmed that you don't inlcude much at all in the final build stage and still need to do various compilations etc first.