terraform-aws-gitlab-runner icon indicating copy to clipboard operation
terraform-aws-gitlab-runner copied to clipboard

Are docker images pruned from runners after use?

Open Glen-Moonpig opened this issue 4 years ago • 12 comments

Hello Niek, I have a question you might be able to help me with..

I recently had to increase the root volume size of the runners because they were running out of disk space during job execution. I think this happened because some pipelines started using large docker images and suspect the images are pulled from the registry to the EC2 instance root volume to execute the job and then after the job completes the image remains in the machine's local registry.

Do you have any idea how the local image registry is managed on the runners? Is there any kind of automatic image pruning? Do you have any recommendations on how to clean up old images from the runner volumes?

Glen-Moonpig avatar Apr 01 '20 19:04 Glen-Moonpig

@Glen-Moonpig I've run into this as well and fixed it by adding the gitlab-runner-docker-cleaner to my post install, which works really well for me:

locals {
  userdata_post_install = <<-POST
  if ! docker ps --format '{{.Names}}' | grep -w gitlab-runner-docker-cleanup &> /dev/null; then
      docker run -d \
          -e LOW_FREE_SPACE=40G \
          -e EXPECTED_FREE_SPACE=50G \
          -e LOW_FREE_FILES_COUNT=1048576 \
          -e EXPECTED_FREE_FILES_COUNT=2097152 \
          -e DEFAULT_TTL=10m \
          -e USE_DF=1 \
          --restart always \
          -v /var/run/docker.sock:/var/run/docker.sock \
          --name=gitlab-runner-docker-cleanup \
          quay.io/gitlab/gitlab-runner-docker-cleanup &
  fi
  POST
}

And then set userdata_post_install = locals.userdata_post_install in the module.

fliphess avatar Apr 02 '20 09:04 fliphess

it's good to know that pulling the image takes a long time, which caused a race condition while registering the runner resulting in being blocked on the gitlab api. This was solved by pulling the image in the background, hence the appended & :)

fliphess avatar Apr 02 '20 09:04 fliphess

Thanks @fliphess , I had seen this repo but wasn't sure if it would be compatible with this module, I will try it out. Does the image run on the agent machine and clean up images from the runner machines?

Glen-Moonpig avatar Apr 02 '20 09:04 Glen-Moonpig

It doesn't on docker-machine, only on the local runner, I'm currently working on a new gitlab-runner setup for the company I work for to create a scaling setup, and I haven't solved this yet.

Gitlab provides tooling to cleanup: https://gitlab.com/gitlab-org/gitlab-runner/blob/master/packaging/root/usr/share/gitlab-runner/clear-docker-cache

You might be able to run this script as a post task or configure a cronjob through --amazonec2-userdata

fliphess avatar Apr 02 '20 10:04 fliphess

@Glen-Moonpig I only use the docker-machine setup, with a short cycle of ec2 instances. Thereform not having this issue. Would you like to update the docker example with the post install. Sound like a good approach.

npalm avatar Apr 04 '20 09:04 npalm

@fliphess Thanks for sharing! What value are you using for runners_root_size with that userdata_post_install script?

lsorber avatar Jul 02 '20 12:07 lsorber

Hey @lsorber We are using 150GB disks.

As we do some integration testing that requires pulling lots of different images at the same time, resulting in a large storage need, we added another cronjob that runs nighty as well to ensure some images are removed too as the cleaner docker wasn't always aware of images that are pulled directly using the docker daemon:

  ## Create cleanup cronjob
  cat > /usr/local/bin/clean-docker <<CRON
  #!/bin/bash

  # Remove exited containers
  docker ps -a -q -f status=exited    | xargs --no-run-if-empty docker rm -v

  # Remove dangling images
  docker images -f "dangling=true" -q | xargs --no-run-if-empty docker rmi

  # Remove unused images
  docker images | awk '/ago/  { print $3}' | xargs --no-run-if-empty docker rmi

  # Remove dangling volumes
  docker volume ls -qf dangling=true  | xargs --no-run-if-empty docker volume rm
  CRON

  # Make executable
  chmod +x  /usr/local/bin/clean-docker

  # Add to cron
  echo -e '[email protected]\n0 1 * * * root /bin/flock -n /tmp/.docker-clean.lock /usr/local/bin/clean-docker > /dev/null 2>&1\n' > /etc/cron.d/gitlab-runner-cleaner

Which is a bazooka shooting mosquitto's to ensure all leftovers were removed nighlty.

fliphess avatar Jul 02 '20 13:07 fliphess

As the link to the Gitlab documentation is no longer working, try out this one: https://docs.gitlab.com/runner/executors/docker.html

Solution is either use one of the scripts above or the Gitlab scripts I mentioned.

kayman-mk avatar Oct 14 '21 11:10 kayman-mk

And if required, the cache cleaner script is here (the url has changed):

https://gitlab.com/gitlab-org/gitlab-runner/-/blob/main/packaging/root/usr/share/gitlab-runner/clear-docker-cache

fliphess avatar Nov 19 '21 10:11 fliphess

@fliphess But the script has to run on the docker+machine and not on the agent. Any idea how to get it working?

kayman-mk avatar Sep 28 '22 12:09 kayman-mk

@kayman-mk Sorry but I'm not using the module anymore as I switched job and am now using the kubernetes executor.

Have a look at the --amazonec2-userdata setting for docker-machine (You can find an example of using it in gitlab-runner over here and here, you should be able to add some commands there.

But tbh, I think the easiest way to do this is bake your own AMI with all the required cleanup tools, crons etc added to it instead of adding complex scripts through userdata)

fliphess avatar Sep 28 '22 18:09 fliphess

Activated the following via crontab on my agents (via userdata_post_install). Works like a charme and cleans the docker images/containers/volumes every hour.

# install docker cleanup scripts for the docker+machine instances
cat << "EOF" > /etc/cron.hourly/clean-docker-machine-caches.sh
  #!/usr/bin/env bash
  for i in `docker-machine ls | cut -d' ' -f1`; do
    (echo "sudo -s;" && cat /usr/share/gitlab-runner/clear-docker-cache) | docker-machine ssh $i
  done
EOF

chmod a+x /etc/cron.hourly/clean-docker-machine-caches.sh

kayman-mk avatar Oct 13 '22 08:10 kayman-mk

@Glen-Moonpig Seems to be solved now and can be closed, right?

kayman-mk avatar Dec 24 '22 21:12 kayman-mk