libelektra icon indicating copy to clipboard operation
libelektra copied to clipboard

reduce docker images

Open markus2330 opened this issue 2 years ago • 16 comments

  • [ ] scripts/docker/jenkinsnode/Dockerfile see #4620
  • [ ] scripts/docker/fedora/32/Dockerfile
  • [ ] remove not used packages (e.g. ninja vs. make)

markus2330 avatar Nov 05 '22 08:11 markus2330

As 0x6178656c wrote in https://github.com/ElektraInitiative/libelektra/pull/4620#issuecomment-1295395453:

  • If this image is still relevant this should be documented accordingly
  • If this image is no longer used it should be removed from repository

markus2330 avatar Nov 05 '22 08:11 markus2330

We are regularly running into "no space left" problems because of too many Docker images, so I tagged it as urgent and removed the "probably to be removed".

@mpranj any other suggestions other than the two Docker images above?

markus2330 avatar Nov 13 '22 06:11 markus2330

One thing I noticed about our images is that they are very big. Maybe we can look into making them smaller, that should help with the disk space problems.

kodebach avatar Nov 13 '22 12:11 kodebach

We are regularly running into "no space left" problems because of too many Docker images, so I tagged it as urgent and removed the "probably to be removed".

I think this will not save any space.

AFAIK: removing unused images will do nothing to our disk space usage, as the images are not built by the pipeline. The images are build only when needed. We are actively using many images, so that is a problem.

One thing I noticed about our images is that they are very big.

Would be great if we can do something about this.

Maybe we can add the docker build option --squash to avoid storing multiple layers of the filesystem. There are always pros and cons, but it's worth a shot.

mpranj avatar Nov 13 '22 14:11 mpranj

Maybe we can add the docker build option --squash to avoid storing multiple layers of the filesystem.

Wouldn't that mean different images can't share a layer and all images would have to be built entirely from scratch, if there is the tiniest difference?

The images are build only when needed.

So do we actually build new images for every Jenkins run? Is there any kind of auto-cleanup?


Also, since I don't have access to the CI servers: Are we sure that the docker images are the problem? Could there be something else that is eating disk space too, e.g. log files with long retention periods, or artifacts of old builds?

kodebach avatar Nov 13 '22 15:11 kodebach

Wouldn't that mean different images can't share a layer and all images would have to be built entirely from scratch, if there is the tiniest difference?

Yes, but I'll test this now to see if there is any difference. Also, I know this is how it should be true on one machine, but I have a feeling we're not reusing layers anyhow.

Also, since I don't have access to the CI servers: Are we sure that the docker images are the problem?

Yes, pretty sure it is at least the biggest problem. Most other things are cleaned up.

So do we actually build new images for every Jenkins run? Is there any kind of auto-cleanup?

Not for every run, but when they are needed. So images are reused once they are build. They are rebuilt monthly s.t. the packages are updated periodically.

mpranj avatar Nov 13 '22 15:11 mpranj

Wouldn't that mean different images can't share a layer and all images would have to be built entirely from scratch, if there is the tiniest difference?

Unfortunately you're right. I've tested the --squash option and for the case of the build-elektra-fedora-36 images the difference is only 2.16GB vs 2.03GB.

mpranj avatar Nov 13 '22 19:11 mpranj

Okay, how exactly is our Fedora 36 Image over 2GB in size, when the base fedora:36 image is <60MB (see Docker Hub)? There has to be something in there that we don't need...

Another thing we could do: Remove Java from all images except one, maybe even remove it completely from Jenkins and only test on Cirrus. The JVM should be the same everywhere.

kodebach avatar Nov 13 '22 19:11 kodebach

AFAIK: removing unused images will do nothing to our disk space usage, as the images are not built by the pipeline. The images are build only when needed.

Yes, this is why I extended the scope of this issue, the idea was to suggest which used Docker images (probably the least important ones) to remove or how to make them smaller.

Another thing we could do: Remove Java from all images except one, maybe even remove it completely from Jenkins and only test on Cirrus. The JVM should be the same everywhere.

Actually especially Java is very prone to problems in CMake detection and similar. So it is good to have these tests across several distributions.

Btw. the issue seems to be not as urgent as I thought. Used disc space is now: 346G used, 1.5T available, i.e. 20% used, so the problem is simply that running docker prune -af once a month was not enough.

Further suggestions what to reduce nevertheless are welcome. At some point we will need to do the cleanup.

markus2330 avatar Nov 14 '22 10:11 markus2330

Also, since I don't have access to the CI servers: Are we sure that the docker images are the problem? Could there be something else that is eating disk space too, e.g. log files with long retention periods, or artifacts of old builds?

After running docker prune -af on a7 the disc space usage goes from 100% to less than 20%.

markus2330 avatar Nov 14 '22 10:11 markus2330

the idea was to suggest which used Docker images (probably the least important ones) to remove or how to make them smaller.

I see we have 4 different Debian Bullseye images? Why? I get the minimal image to test without installing dependencies, but the rest are probably wasting space. The same goes for Debian Buster.

Also if docker image prune -af (or even docker system prune) cleaned up > 1TB auf space, I would really be interested in what exactly was removed. e.g. docker image ls before and afterwards would be interesting.

Additionally, we can probably run docker image prune (without -a) much more often. It should not remove anything we need.

kodebach avatar Nov 14 '22 11:11 kodebach

Btw. the issue seems to be not as urgent as I thought. Used disc space is now: 346G used, 1.5T available, i.e. 20% used, so the problem is simply that running docker prune -af once a month was not enough.

Also if docker image prune -af (or even docker system prune) cleaned up > 1TB auf space

Seriously doubt this happened. Usually it cleans about 100-200GB. Maybe we should prune -af weekly? prune -f is run daily, prune -af is run monthly Note that deleting all images also means that the current ones need to be fetched from our docker registry, which has a rather slow connection.

What machine are you talking about?

On a7 we store the:

  • docker registry on spinning 2TB disks.
  • /var/lib/docker and JenkinsHome and other stuff on the 250GB SSD, which is usually what runs out of space.

What might be a problem: The build agents keep current images which they need. (so far, everything is OK) When a Dockerfile is changed, a new version of this image is built and the build agents retrieve this image. Now we have two versions of this image per build agent. The issue worsens when multiple PRs change images multiple times.

mpranj avatar Nov 14 '22 15:11 mpranj

I see we have 4 different Debian Bullseye images? Why?

To also test cmake exclusion of modules. Probably we should make these images build upon each other to use less space?

Maybe we should prune -af weekly?

Yes, sounds like the easiest solution for now. Is there some way to only cleanup the images that weren't used for a week?

What machine are you talking about?

In https://github.com/ElektraInitiative/libelektra/issues/4637#issuecomment-1313475416 I was talking about a7 of the recent incident https://github.com/ElektraInitiative/libelektra/issues/160#issuecomment-1312652971.

markus2330 avatar Nov 16 '22 05:11 markus2330

To also test cmake exclusion of modules. Probably we should make these images build upon each other to use less space?

Building the images on top of each other would definitely help.

There's probably a few other things we can do. Like reducing the number of RUNs to reduce layers, or check that we're not installing e.g. some GUIs or other unnecessary stuff.

Is there some way to only cleanup the images that weren't used for a week?

Yes, the --filter argument can be used with a timestamp. See e.g. this page

kodebach avatar Nov 16 '22 14:11 kodebach

Fedora32 Docker image analysis

So I did a small investigation on the "scripts/docker/fedora/32/Dockerfile" image. I analyzed its layers and most of the size comes from all the packages installed. The whole image is 2.61GB and around 2.4GB are packages. image

Top 10 packages by size.

Package Size (MB)
golang-bin-1.14.15-3.fc32.x86_64 255.98
java-11-openjdk-headless-11.0.11.0.9-0.fc32.x86_64 170.76
java-1.8.0-openjdk-headless-1.8.0.292.b10-0.fc32.x86_64 117.47
clang-libs-10.0.1-3.fc32.x86_64 92.07
gcc-10.3.1-1.fc32.x86_64 81.71
llvm-libs-10.0.1-4.fc32.x86_64 78.23
glibc-debuginfo-2.31-6.fc32.x86_64 76.42
mesa-dri-drivers-20.2.3-1.fc32.x86_64 65.74
glibc-debuginfo-common-2.31-6.fc32.x86_64 57.20
python27-2.7.18-8.fc32.x86_64 54.59

Improvements

Adding weak_deps=False option

dnf install --setopt=install_weak_deps=False

--setopt=install_weak_deps=False: This flag disables the installation of weak dependencies, which can help reduce the number of unnecessary packages installed. Equivalent to --no-install-recommends in apt-get.

Result

Adding this dnf option reduced the image size by ~15%. image

Maybe it might be interesting to use some container registry like ghcr.io to reduce duplicate code and build some base images, that other dockerfiles could build upon.

4ydan avatar Jun 22 '23 11:06 4ydan

Thank you for the investigation. Yes, please add this option(s).

markus2330 avatar Jul 02 '23 17:07 markus2330

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

github-actions[bot] avatar Jul 02 '24 01:07 github-actions[bot]

I closed this now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

github-actions[bot] avatar Jul 16 '24 01:07 github-actions[bot]