augur icon indicating copy to clipboard operation
augur copied to clipboard

Improve time and space efficiency of backend Docker container build

Open MoralCode opened this issue 10 months ago • 10 comments

Is your feature request related to a problem? If so, please describe the problem: disk space limitations make it hard to build the backend docker image (seems to be at LEAST 6GB before it failed on my fedora VM, possibly 12 if podman needs 2x the space to write the final image)

Potential solutions: After looking through the dockerfiles I noticed a few things that could be improved

  1. Most of the space taken up by the image is due to rust and golang dependencies that likely arent needed at runtime since both of those languages ship fat binaries with everything in one place (as far as i know, i only have passing experience with both of those languages)

Image

  1. Releatedly, the source code for these builds that is cloned as part of the build is also kept around. removing this could save an additional 334M
  2. The build, specifically install-workers-deps.sh includes the NLTK popular metapackage, which includes lots of other packages. are all of these sub packages being used? (theres between 8k and 139M savings depending on what can be excluded, i have a list of them by size in my notes)

My remaining questions so far:

  • Is there a reason that these dependencies are being built from source?
    • If not, Would it be better to contribute these build steps upstream to places like openssf so augur can rely on fetching the binaries from a popular registry (like crates.io or whatever golang uses)?

Additional context: Slack thread (in CHAOSS Slack) about this: https://chaoss-workspace.slack.com/archives/C0226ELG6R4/p1739815260328199

This PR seems to at least have been attempting a partial solution for the first two parts of this by creating a separate build step: https://github.com/chaoss/augur/pull/2947

The large image size may also cause an issue for some users who are using older versions of podman which may have trouble making large commits containing large image layers

MoralCode avatar Feb 19 '25 17:02 MoralCode

Also likely related: #2982

MoralCode avatar Feb 19 '25 17:02 MoralCode

@GregSutcliffe will have some thoughts on this. I know he has mentioned some things around go and that there might be a significantly more efficient way to get the git info without needed a full clone

cdolfi avatar Feb 19 '25 18:02 cdolfi

Im refactoring the docker container into a more builder pattern style (with separate containers for golang and rust)

Are there any dependencies relying on rust right now? because I dont see any

MoralCode avatar Feb 22 '25 20:02 MoralCode

@MoralCode It looks like we are using Rust in order to use cargo to install geckodriver. Sean added that install to the Dockerfile in 5a14c422504022da0ea6892f40dd954cef856da6

Ulincsys avatar Feb 23 '25 02:02 Ulincsys

@MoralCode It looks like we are using Rust in order to use cargo to install geckodriver. Sean added that install to the Dockerfile in 5a14c42

It seems like that has since been removed/replaced with a tarball method of installing firefox and geckodriver https://github.com/chaoss/augur/commit/9eb4b6150739a12166a84b7f9f17d84d42011104

MoralCode avatar Feb 23 '25 14:02 MoralCode

@sgoggins do you know of any rust dependencies?

cdolfi avatar Feb 24 '25 02:02 cdolfi

@MoralCode Does #3012 close this?

cdolfi avatar Mar 11 '25 14:03 cdolfi

@MoralCode Does #3012 close this?

that PR addresses the first couple points, but i think i was still curious about

The build, specifically install-workers-deps.sh includes the NLTK popular metapackage, which includes lots of other packages. are all of these sub packages being used? (theres between 8k and 139M savings depending on what can be excluded, i have a list of them by size in my notes)

My remaining questions so far:

Is there a reason that these dependencies are being built from source? If not, Would it be better to contribute these build steps upstream to places like openssf so augur can rely on fetching the binaries from a popular registry (like crates.io or whatever golang uses)?

MoralCode avatar Mar 11 '25 14:03 MoralCode

@MoralCode ? :)

sgoggins avatar Mar 18 '25 14:03 sgoggins

What do you need from me for this? Right now im mostly waiting on answers to some of my remaining questions to see if i can reduce the space usage even further by:

  • removing unnecessary NLTK libaries (waiting on info regarding which ones are actually used)
  • using scc/scorecard binaries downloaded from golangs package manager rather than building from source to save even more disk space/reduce the need to have builder containers at all

will probably also fold in the pip duplication from #3045 and/or #3053 if/when i end up making a PR for them and/or the above fixes

MoralCode avatar Mar 18 '25 15:03 MoralCode

Current dockerfile commands that take the most space are:

https://github.com/chaoss/augur/blob/dc17a8f57e47a247aefa11c61f9c777b658aebd0/docker/backend/Dockerfile#L88 ( 3.46GB) and https://github.com/chaoss/augur/blob/dc17a8f57e47a247aefa11c61f9c777b658aebd0/docker/backend/Dockerfile#L28 ( 1.24GB).

$ podman history --human e811d5319587
ID            CREATED      CREATED BY                                     SIZE                 COMMENT
f5d1651065a4  2 days ago   /bin/sh -c #(nop) CMD ["/init.sh"]             0B                   
<missing>     2 days ago   /bin/sh -c #(nop) ENTRYPOINT ["/bin/bash",...  0B                   
<missing>     2 days ago   /bin/sh -c chmod +x /entrypoint.sh /init.sh    4.61kB               
542252eed461  2 days ago   /bin/sh -c #(nop) COPY file:469bc734639769...  2.05kB               
47a9ab059f88  2 days ago   /bin/sh -c #(nop) COPY file:28e7735772972f...  3.07kB               
26df5a1b1799  2 days ago   /bin/sh -c ln -s /cache /augur/augur/stati...  3.58kB               
a3e3853a7768  2 days ago   /bin/sh -c mkdir -p repos/ logs/ /augur/fa...  3.58kB               
5c7cec6c75e8  2 days ago   /bin/sh -c ${SCORECARD_DIR}/scorecard version  1.54kB               
b4ccb71fbab5  2 days ago   /bin/sh -c ${SCC_DIR}/scc --version            1.54kB               
790349537c42  2 days ago   /bin/sh -c #(nop) COPY file:7c3300d56112b9...  78.3MB               
68e87e6170d9  2 days ago   /bin/sh -c #(nop) ENV SCORECARD_DIR=/score...  0B                   
<missing>     2 days ago   /bin/sh -c #(nop) COPY file:4d4f7c23877871...  9.43MB               
fe33f0b1d72c  2 days ago   /bin/sh -c #(nop) ENV SCC_DIR=/scc             0B                   
<missing>     2 days ago   /bin/sh -c #(nop) ENV PATH="/augur/.venv/b...  0B                   
<missing>     2 days ago   /bin/sh -c --mount=type=cache,target=/root...  56.3kB               
b1105dd18933  2 days ago   /bin/sh -c find scripts -exec chmod u=rwx,...  138kB                
1fc1d2b312e4  2 days ago   /bin/sh -c find keyman -type d -exec chmod...  29.7kB               
c390afaedc51  2 days ago   /bin/sh -c find augur -type d -exec chmod ...  18.2MB               
a807b5b1e4f0  2 days ago   /bin/sh -c #(nop) COPY dir:9b05c533e03c748...  29.2kB               
6d26b101ecdd  2 days ago   /bin/sh -c #(nop) COPY dir:827994690b34ed6...  137kB                
48cb51e80e50  2 days ago   /bin/sh -c #(nop) COPY file:56f6d648dc46ad...  2.56kB               
9d1ea3088c9d  2 days ago   /bin/sh -c #(nop) COPY dir:a288439a9be001d...  18.2MB               
944d0a28c8bd  2 days ago   /bin/sh -c #(nop) COPY file:ca2856255209f8...  5.12kB               
88e315fac158  2 days ago   /bin/sh -c #(nop) COPY file:bb611e7747fea3...  3.58kB               
b44ad2da1861  2 days ago   /bin/sh -c #(nop) COPY file:288fa24ab30c18...  13.3kB               
1017c86b7025  2 days ago   /bin/sh -c --mount=type=cache,target=/root...  3.46GB               
6b0240de5e22  2 days ago   /bin/sh -c #(nop) COPY file:28db77a8470e7d...  2.56kB               
f07482629330  2 days ago   /bin/sh -c #(nop) COPY file:c253056d86a0fe...  514kB                
054d94147742  2 days ago   /bin/sh -c #(nop) COPY file:d1b0f125a568a5...  6.14kB               
96125df2261b  2 days ago   /bin/sh -c #(nop) WORKDIR /augur               0B                   
<missing>     2 days ago   /bin/sh -c #(nop) ENV UV_LOCKED=1              0B                   
<missing>     2 days ago   /bin/sh -c #(nop) ENV UV_LINK_MODE=copy        0B                   
<missing>     2 days ago   /bin/sh -c #(nop) ENV UV_COMPILE_BYTECODE=1    0B                   
<missing>     2 days ago   /bin/sh -c #(nop) COPY multi:f340754a871bb...  43.4MB               
7a8e07de3fdf  2 weeks ago  /bin/sh -c #(nop) EXPOSE 5000                  0B                   
<missing>     2 weeks ago  /bin/sh -c geckodriver --version               1.54kB               
b3133df7cdb0  2 weeks ago  /bin/sh -c firefox --version                   5.63kB               
6dba3dcd126d  2 weeks ago  /bin/sh -c GECKODRIVER_VERSION=$(curl -s h...  6.14MB               
eb4ce3c07f5c  2 weeks ago  /bin/sh -c set -x     && apt-get update   ...  1.24GB               
a17705e7c6f1  2 weeks ago  /bin/sh -c #(nop) ENV PATH="/usr/bin/:/usr...  0B                   
<missing>     2 weeks ago  /bin/sh -c #(nop) ENV DEBIAN_FRONTEND=noni...  0B                   
<missing>     2 weeks ago  /bin/sh -c #(nop) LABEL version="0.86.1"       0B                   
<missing>     2 weeks ago  /bin/sh -c #(nop) LABEL maintainer="outdoo...  0B                   FROM docker.io/library/python:3.11-slim-bullseye
<missing>     6 weeks ago  CMD ["python3"]                                0B                   buildkit.dockerfile.v0
<missing>     6 weeks ago  RUN /bin/sh -c set -eux;                       for src in idle3...  5.12kB              buildkit.dockerfile.v0
<missing>     6 weeks ago  RUN /bin/sh -c set -eux;                                            savedAptMark="$...  45.2MB      buildkit.dockerfile.v0
<missing>     6 weeks ago  ENV PYTHON_SHA256=8fb5f9fbc7609fa822cb3154...  0B                   buildkit.dockerfile.v0
<missing>     6 weeks ago  ENV PYTHON_VERSION=3.11.13                     0B                   buildkit.dockerfile.v0
<missing>     6 weeks ago  ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628...  0B                   buildkit.dockerfile.v0
<missing>     6 weeks ago  RUN /bin/sh -c set -eux;                       apt-get update; ...  3.39MB      buildkit.dockerfile.v0
<missing>     6 weeks ago  ENV LANG=C.UTF-8                               0B                   buildkit.dockerfile.v0
<missing>     6 weeks ago  ENV PATH=/usr/local/bin:/usr/local/sbin:/u...  0B                   buildkit.dockerfile.v0
<missing>     6 weeks ago  # debian.sh --arch 'amd64' out/ 'bullseye'...  84.2MB               debuerreotype 0.15

MoralCode avatar Jul 17 '25 16:07 MoralCode