docker-introduction Section on Layers

One of my learners from a couple of weeks ago reported back that he made an image for his work recently! 😄

He mentioned that he didn't really know much about layers until he was constructing this image. While I off-handedly mentioned layers during the workshop, he suggested maybe we add a 15 min section about layers. I also think it would be a good idea. I'm wondering if we could add a conceptual explanation and also build up a dockerfile with one layer at a time, showing that the previous layer exists and that it doesn't get re-built (unless you change something above)?

May 31 '19 17:05 sstevens2

👋 that was me. The workshop was great. I've shared the materials with others in my research group who couldn't attend.

I recall that another-greeting example here does intentionally fail at first. When it builds successfully, we see the cache being used:

Sending build context to Docker daemon  3.072kB
Step 1/4 : FROM python:3-slim
 ---> ca7f9e245002
Step 2/4 : WORKDIR /usr/src/app
 ---> Using cache
 ---> c0d009871ab7
Step 3/4 : COPY test.py .
 ---> 23b27e9f57a9
Step 4/4 : CMD [ "python", "./test.py" ]
 ---> Running in 0d48c16b40c1
Removing intermediate container 0d48c16b40c1
 ---> bede6575d987
Successfully built bede6575d987
Successfully tagged another-greeting:latest

At the time, I wasn't paying attention to caching. This container was so simple that I also didn't noticed the cache helping much. The best practices on build stages was very informative.

A simple lesson to highlight this could examine what happens in two different images when the layers are ordered in the recommended way versus the inverse order.

Recommended:

RUN install stable software
RUN other stuff
COPY my-file.txt .

Inverse:

COPY my-file.txt .
RUN other stuff
RUN install stable software

If we build, modify the file, and build again, would this exhibit different caching behaviors?

May 31 '19 19:05 agitter

Thanks for your layers feedback overall @agitter ! More specifically, thanks for your suggested simple lesson on recommended and inverse layer order—that looks ideal.

Further is the possibility to demonstrate the value of merging consecutive RUN lines in terms of reducing the number of layers.

I'll definitely try to add this with credit before I use it in (non-Carpentry) teaching within a few weeks, if this doesn't emerge otherwise before that time.

Jun 04 '19 11:06 dme26

Further is the possibility to demonstrate the value of merging consecutive RUN lines in terms of reducing the number of layers.

That's a good idea. That structure confused me when I inspected Dockerfiles before learning how they worked.

Jun 04 '19 11:06 agitter

This can probably be done in the lesson as a callout that essentially says each command in the dockerfile is cached so there are optimization/speed/rebuilding ramifications depending on the order of commands. In general, system package installations are down towards the top and take the most amount of setup time, and the user-specific code should be put towards the bottom so changes to the user-supplied codebase does not re-trigger an entire new image build.

The "recommended" and "inverse" example (https://github.com/carpentries-incubator/docker-introduction/issues/6#issuecomment-497825784) can be added to the callout to show this.

Having said all this, you'll see a lot of dockerfiles do almost everything in a single layer

Jun 02 '21 23:06 chendaniely

In terms of optimization, multi-stage builds should be considered for more complex applications, e.g. builds reaching out to some binary or any situation where order must be a priority (as mentioned and elaborated upon above). I wanted to point out this resource which walks through an example of the multi-staged builds, in example 1 which highlights performance and provides a clear example image to explain layers.
Further is the possibility to demonstrate the value of merging consecutive RUN lines in terms of reducing the number of layers.

There are a few ways to do this and doing so for the 'Creating More Complex Images Section' would be appropriate. Common syntax includes either:

RUN wget xyz && tar xyz

OR

RUN wget xyz && \  
    tar xyz

Jan 12 '22 06:01 vbagadia

Thanks for highlighting the post with details about multi-stage builds @vbagadia. I think this provides a really nice overview of the different options for producing more compact images while also being clear about both the positive and negative aspects of the different approaches covered.

At the same time, I think the multi-stage build content is beyond the scope of this lesson (indeed, I don't think that you were suggesting that we include this anyway?), however showing how to combine commands so that only a single layer is generated, along the lines of the examples you provide above, is a good thing to point out.

Even if we don't go into great detail, given the introductory nature of the lesson, I definitely think adding some further content to explain more about the cache and best practices for structuring Dockerfiles would be useful. As @chendaniely suggests, a callout could be a good option for this.

Jan 12 '22 12:01 jcohen02

Also mentioned in lesson peer review

Do we need to include this before we go back to reviewers?

Jul 29 '24 13:07 aturner-epcc

docker-introduction docker-introduction copied to clipboard

Section on Layers

docker-introduction
docker-introduction copied to clipboard