docker-images icon indicating copy to clipboard operation
docker-images copied to clipboard

Multistage builds & image sizes

Open vitaliitylyk opened this issue 4 years ago • 8 comments

At the moment variant images (e.g sitecore-xm-jss, sitecore-xm-sxa-jss, etc) use multi-stage build approach to decrease the amount of layers. Most of the work is done in a build stage, and the resulting 2-nd stage images are just copying the results to wwwroot:

COPY --from=build ["C:\\inetpub\\wwwroot\\", "C:\\inetpub\\wwwroot\\"]

On the first glance this looks good, but...

The problem

The COPY command actually increases the resulting image size by the size of wwwroot. Since variant images are based on each other (e.g sitecore-xm-sxa-jss is based on sitecore-xm-sxa), such layering increases the total image size on approximately ~400MB multiplied by number of layers. So sitecore-xm-sxa-jss adds ~800MB to the image size.

Proposed solution

Instead of copying the whole wwwroot directory, we need to copy only changed files. For example, when JSS is installed in the sitecore-xm-jss image we need to copy only contents of the JSS package (and make sure transformations are done on a web.config):

  1. Unpack the JSS package to c:\temp\wwwroot
  2. Copy web.config from BASE_IMAGE to c:\temp\wwwroot
  3. Run XDT transforms
  4. In the 2-nd stage image copy c:\temp\wwwroot to c:\inetpub\wwwroot

If you agree with the approach I can submit a PR ;)

vitaliitylyk avatar Apr 02 '20 15:04 vitaliitylyk

Hi,

I have seen this behavior before, where images becomes unreasonable larger than excepted. I think this is worth investigation if we can improve in this area (size, layer count, startup time, build time).

Agree with your proposed solution. Another thing to try also, the build stage doesn't have to use the base image, it could also just use clean windowsservercore image (then copying the XDT script would be needed also). Not sure if that would improve anything but may be worth a try.

May I suggest that you try it out on a copy of for example sitecore-xm-sxa-jss (I think that is the "deepest" inheritance tree) and the compare size, layer count, build time and startup time. With that results, combined with "readability" we will have a good indication if this is the way going forward.

pbering avatar Apr 02 '20 17:04 pbering

Would it be possible to have a "base" sitecore result image, with the Sitecore root, and then have all the other images containing Sitecore just copy the differences in, something along this:

1: Create base Sitecore image containing a clean Sitecore extracted inside wwwroot. 2: To create xp-cm, create a build image that is based on the above image, make the changes to the wwwroot content in there. 3: Finally, in the resulting image, instead of copying all the wwwroot content, just use robocopy to sync the needed changes (or something similar, not sure if it is possible), so the only content of the xp-cm layer is the few changes from the base image to xp-cm.

Repeat step 2-3 for every other image.

If possible, that should cut down on the size of the images, since the different sitecore instance layers should be pretty small.

GurliGebis avatar Apr 03 '20 05:04 GurliGebis

I did some tests for the sitecore-xm-sxa-jss image and here are the results:

Build time Layer count Size
Fulll COPY 0:00:59 24 9.48GB
Partical COPY 0:00:54 24 8.66GB

Regarding startup time - I don't really know how to best measure it, but on my machine it seems to be the same. But I think ~1GB image size decrease is worth it anyway.

I have also found out that there is an existing bug registered for the COPY command 3 years ago which has never been fixed: https://github.com/moby/moby/issues/21950 . Hard to say if this will ever be fixed. There is also a thread on Stackoverflow about this with some solutions: https://stackoverflow.com/questions/36553502/is-there-a-way-to-add-only-changed-files-to-a-docker-image-as-a-new-layer-with . But I think these solutions are overkill for this particular case.

@pbering regarding your comment about not using base image in build stage - for transformation to work we need to copy the web.config from base image anyway, so it is useful to have it.

@GurliGebis Not sure I understand what would "base image" contain? CM and CD images are built from different Sitecore .zip packages, same applies to XM/XP topologies, so not sure how would one create such a base image. Also I think it is not possible to robocopy from other stages, you can only use COPY command.

vitaliitylyk avatar Apr 03 '20 13:04 vitaliitylyk

When not using the existing base image, you can copy from it with something like COPY --from=runtime ["C:/inetpub/wwwroot/Web.config", "C:/inetpub/wwwroot/Web.config"] so you can do the transformations.

Anyways I agree that 1 GB is worth it especially when the layer count is the same :)

Can you do a WIP PR so I can see it?

pbering avatar Apr 03 '20 18:04 pbering

My point being that CM and CD is 99.9% identical files, so having a small layer on top og CM containing the difference between CM and CD should save slot og space. It would require the possibility to sync instead of copy though.

GurliGebis avatar Apr 03 '20 19:04 GurliGebis

I did some tests for the sitecore-xm-sxa-jss image and here are the results:

Build time Layer count Size Fulll COPY 0:00:59 24 9.48GB Partical COPY 0:00:54 24 8.66GB Regarding startup time - I don't really know how to best measure it, but on my machine it seems to be the same. But I think ~1GB image size decrease is worth it anyway.

I have also found out that there is an existing bug registered for the COPY command 3 years ago which has never been fixed: moby/moby#21950 . Hard to say if this will ever be fixed. There is also a thread on Stackoverflow about this with some solutions: https://stackoverflow.com/questions/36553502/is-there-a-way-to-add-only-changed-files-to-a-docker-image-as-a-new-layer-with . But I think these solutions are overkill for this particular case.

@pbering regarding your comment about not using base image in build stage - for transformation to work we need to copy the web.config from base image anyway, so it is useful to have it.

@GurliGebis Not sure I understand what would "base image" contain? CM and CD images are built from different Sitecore .zip packages, same applies to XM/XP topologies, so not sure how would one create such a base image. Also I think it is not possible to robocopy from other stages, you can only use COPY command.

Hi @vitaliitylyk , can you push the branch; I want to check the layers with Dive.

bplasmeijer avatar Apr 06 '20 07:04 bplasmeijer

@pbering @bplasmeijer I have created a draft PR (for sxa-jss images only) so you can have a look/test: https://github.com/Sitecore/docker-images/pull/287

vitaliitylyk avatar Apr 06 '20 08:04 vitaliitylyk

Hi @vitaliitylyk Can you do a new test with the new images of scr.sitecore.com images? Thanks Bart

bplasmeijer avatar Nov 06 '20 08:11 bplasmeijer