Improve time and space efficiency of backend Docker container build
Is your feature request related to a problem? If so, please describe the problem: disk space limitations make it hard to build the backend docker image (seems to be at LEAST 6GB before it failed on my fedora VM, possibly 12 if podman needs 2x the space to write the final image)
Potential solutions: After looking through the dockerfiles I noticed a few things that could be improved
- Most of the space taken up by the image is due to rust and golang dependencies that likely arent needed at runtime since both of those languages ship fat binaries with everything in one place (as far as i know, i only have passing experience with both of those languages)
- Releatedly, the source code for these builds that is cloned as part of the build is also kept around. removing this could save an additional 334M
- The build, specifically
install-workers-deps.shincludes the NLTKpopularmetapackage, which includes lots of other packages. are all of these sub packages being used? (theres between 8k and 139M savings depending on what can be excluded, i have a list of them by size in my notes)
My remaining questions so far:
- Is there a reason that these dependencies are being built from source?
- If not, Would it be better to contribute these build steps upstream to places like openssf so augur can rely on fetching the binaries from a popular registry (like crates.io or whatever golang uses)?
Additional context: Slack thread (in CHAOSS Slack) about this: https://chaoss-workspace.slack.com/archives/C0226ELG6R4/p1739815260328199
This PR seems to at least have been attempting a partial solution for the first two parts of this by creating a separate build step: https://github.com/chaoss/augur/pull/2947
The large image size may also cause an issue for some users who are using older versions of podman which may have trouble making large commits containing large image layers
Also likely related: #2982
@GregSutcliffe will have some thoughts on this. I know he has mentioned some things around go and that there might be a significantly more efficient way to get the git info without needed a full clone
Im refactoring the docker container into a more builder pattern style (with separate containers for golang and rust)
Are there any dependencies relying on rust right now? because I dont see any
@MoralCode It looks like we are using Rust in order to use cargo to install geckodriver. Sean added that install to the Dockerfile in 5a14c422504022da0ea6892f40dd954cef856da6
@MoralCode It looks like we are using Rust in order to use cargo to install geckodriver. Sean added that install to the Dockerfile in 5a14c42
It seems like that has since been removed/replaced with a tarball method of installing firefox and geckodriver https://github.com/chaoss/augur/commit/9eb4b6150739a12166a84b7f9f17d84d42011104
@sgoggins do you know of any rust dependencies?
@MoralCode Does #3012 close this?
@MoralCode Does #3012 close this?
that PR addresses the first couple points, but i think i was still curious about
The build, specifically install-workers-deps.sh includes the NLTK popular metapackage, which includes lots of other packages. are all of these sub packages being used? (theres between 8k and 139M savings depending on what can be excluded, i have a list of them by size in my notes)
My remaining questions so far:
Is there a reason that these dependencies are being built from source? If not, Would it be better to contribute these build steps upstream to places like openssf so augur can rely on fetching the binaries from a popular registry (like crates.io or whatever golang uses)?
@MoralCode ? :)
What do you need from me for this? Right now im mostly waiting on answers to some of my remaining questions to see if i can reduce the space usage even further by:
- removing unnecessary NLTK libaries (waiting on info regarding which ones are actually used)
- using scc/scorecard binaries downloaded from golangs package manager rather than building from source to save even more disk space/reduce the need to have builder containers at all
will probably also fold in the pip duplication from #3045 and/or #3053 if/when i end up making a PR for them and/or the above fixes
Current dockerfile commands that take the most space are:
https://github.com/chaoss/augur/blob/dc17a8f57e47a247aefa11c61f9c777b658aebd0/docker/backend/Dockerfile#L88 ( 3.46GB) and https://github.com/chaoss/augur/blob/dc17a8f57e47a247aefa11c61f9c777b658aebd0/docker/backend/Dockerfile#L28 ( 1.24GB).
$ podman history --human e811d5319587
ID CREATED CREATED BY SIZE COMMENT
f5d1651065a4 2 days ago /bin/sh -c #(nop) CMD ["/init.sh"] 0B
<missing> 2 days ago /bin/sh -c #(nop) ENTRYPOINT ["/bin/bash",... 0B
<missing> 2 days ago /bin/sh -c chmod +x /entrypoint.sh /init.sh 4.61kB
542252eed461 2 days ago /bin/sh -c #(nop) COPY file:469bc734639769... 2.05kB
47a9ab059f88 2 days ago /bin/sh -c #(nop) COPY file:28e7735772972f... 3.07kB
26df5a1b1799 2 days ago /bin/sh -c ln -s /cache /augur/augur/stati... 3.58kB
a3e3853a7768 2 days ago /bin/sh -c mkdir -p repos/ logs/ /augur/fa... 3.58kB
5c7cec6c75e8 2 days ago /bin/sh -c ${SCORECARD_DIR}/scorecard version 1.54kB
b4ccb71fbab5 2 days ago /bin/sh -c ${SCC_DIR}/scc --version 1.54kB
790349537c42 2 days ago /bin/sh -c #(nop) COPY file:7c3300d56112b9... 78.3MB
68e87e6170d9 2 days ago /bin/sh -c #(nop) ENV SCORECARD_DIR=/score... 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY file:4d4f7c23877871... 9.43MB
fe33f0b1d72c 2 days ago /bin/sh -c #(nop) ENV SCC_DIR=/scc 0B
<missing> 2 days ago /bin/sh -c #(nop) ENV PATH="/augur/.venv/b... 0B
<missing> 2 days ago /bin/sh -c --mount=type=cache,target=/root... 56.3kB
b1105dd18933 2 days ago /bin/sh -c find scripts -exec chmod u=rwx,... 138kB
1fc1d2b312e4 2 days ago /bin/sh -c find keyman -type d -exec chmod... 29.7kB
c390afaedc51 2 days ago /bin/sh -c find augur -type d -exec chmod ... 18.2MB
a807b5b1e4f0 2 days ago /bin/sh -c #(nop) COPY dir:9b05c533e03c748... 29.2kB
6d26b101ecdd 2 days ago /bin/sh -c #(nop) COPY dir:827994690b34ed6... 137kB
48cb51e80e50 2 days ago /bin/sh -c #(nop) COPY file:56f6d648dc46ad... 2.56kB
9d1ea3088c9d 2 days ago /bin/sh -c #(nop) COPY dir:a288439a9be001d... 18.2MB
944d0a28c8bd 2 days ago /bin/sh -c #(nop) COPY file:ca2856255209f8... 5.12kB
88e315fac158 2 days ago /bin/sh -c #(nop) COPY file:bb611e7747fea3... 3.58kB
b44ad2da1861 2 days ago /bin/sh -c #(nop) COPY file:288fa24ab30c18... 13.3kB
1017c86b7025 2 days ago /bin/sh -c --mount=type=cache,target=/root... 3.46GB
6b0240de5e22 2 days ago /bin/sh -c #(nop) COPY file:28db77a8470e7d... 2.56kB
f07482629330 2 days ago /bin/sh -c #(nop) COPY file:c253056d86a0fe... 514kB
054d94147742 2 days ago /bin/sh -c #(nop) COPY file:d1b0f125a568a5... 6.14kB
96125df2261b 2 days ago /bin/sh -c #(nop) WORKDIR /augur 0B
<missing> 2 days ago /bin/sh -c #(nop) ENV UV_LOCKED=1 0B
<missing> 2 days ago /bin/sh -c #(nop) ENV UV_LINK_MODE=copy 0B
<missing> 2 days ago /bin/sh -c #(nop) ENV UV_COMPILE_BYTECODE=1 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY multi:f340754a871bb... 43.4MB
7a8e07de3fdf 2 weeks ago /bin/sh -c #(nop) EXPOSE 5000 0B
<missing> 2 weeks ago /bin/sh -c geckodriver --version 1.54kB
b3133df7cdb0 2 weeks ago /bin/sh -c firefox --version 5.63kB
6dba3dcd126d 2 weeks ago /bin/sh -c GECKODRIVER_VERSION=$(curl -s h... 6.14MB
eb4ce3c07f5c 2 weeks ago /bin/sh -c set -x && apt-get update ... 1.24GB
a17705e7c6f1 2 weeks ago /bin/sh -c #(nop) ENV PATH="/usr/bin/:/usr... 0B
<missing> 2 weeks ago /bin/sh -c #(nop) ENV DEBIAN_FRONTEND=noni... 0B
<missing> 2 weeks ago /bin/sh -c #(nop) LABEL version="0.86.1" 0B
<missing> 2 weeks ago /bin/sh -c #(nop) LABEL maintainer="outdoo... 0B FROM docker.io/library/python:3.11-slim-bullseye
<missing> 6 weeks ago CMD ["python3"] 0B buildkit.dockerfile.v0
<missing> 6 weeks ago RUN /bin/sh -c set -eux; for src in idle3... 5.12kB buildkit.dockerfile.v0
<missing> 6 weeks ago RUN /bin/sh -c set -eux; savedAptMark="$... 45.2MB buildkit.dockerfile.v0
<missing> 6 weeks ago ENV PYTHON_SHA256=8fb5f9fbc7609fa822cb3154... 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ENV PYTHON_VERSION=3.11.13 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628... 0B buildkit.dockerfile.v0
<missing> 6 weeks ago RUN /bin/sh -c set -eux; apt-get update; ... 3.39MB buildkit.dockerfile.v0
<missing> 6 weeks ago ENV LANG=C.UTF-8 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ENV PATH=/usr/local/bin:/usr/local/sbin:/u... 0B buildkit.dockerfile.v0
<missing> 6 weeks ago # debian.sh --arch 'amd64' out/ 'bullseye'... 84.2MB debuerreotype 0.15