terraform-aws-gitlab-runner icon indicating copy to clipboard operation
terraform-aws-gitlab-runner copied to clipboard

dind docker builds: how to make it work like GitLab shared runners?

Open jrr opened this issue 2 years ago • 4 comments

We're using a runner based on the runner-default example, and overall it's been great: we're able to run most* of our builds on it, and they run much faster than they did on the GitLab shared runners we were using before.

The jobs we're not yet running on the EC2 runner are those that perform Docker builds, using dind like this:

example-job:
  image: docker:20.10
  variables:
    DOCKER_HOST: tcp://docker:2375/
  services:
    - docker:20.10-dind
  script: |
    docker build (...)

These work on the shared runners, but fail on our EC2 runners with a message like this:

Cannot connect to the Docker daemon at tcp://docker:2375/. Is the docker daemon running?

Based on my current shaky understanding of the various pieces (this project, docker-machine, GitLab runners, etc.), I think the important question is:

How do I configure an AWS runner to be able to perform dind docker builds the same way as GitLab's runners?

Would you expect the above job to work on a runner provisioned with the runner-default example? (if so, then I'll more closely diff my config against that)

Do I need to set any specific inputs?

Here's my current configuration:

main.tf
data "aws_availability_zones" "available" {
  state = "available"
}

data "aws_security_group" "default" {
  name   = "default"
  vpc_id = module.vpc.vpc_id
}
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "2.70"

  name = "vpc-${var.environment}"
  cidr = "10.0.0.0/16"

  azs             = [data.aws_availability_zones.available.names[0]]
  private_subnets = ["10.0.1.0/24"]
  public_subnets  = ["10.0.101.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true
  enable_s3_endpoint = true

  tags = {
    Environment = var.environment
  }
}
module "gitlab-runner" {
  # https://registry.terraform.io/modules/npalm/gitlab-runner/aws/
  source  = "npalm/gitlab-runner/aws"
  version = "4.27.0"

  aws_region  = var.aws_region
  environment = var.environment

  vpc_id                   = module.vpc.vpc_id
  subnet_ids_gitlab_runner = module.vpc.private_subnets
  subnet_id_runners        = element(module.vpc.private_subnets, 0)

  runners_name       = var.runner_name
  runners_gitlab_url = "https://gitlab.com"

  gitlab_runner_registration_config = {
    registration_token = var.registration_token
    tag_list           = var.tag_list
    description        = var.runner_description
    locked_to_project  = "true"
    run_untagged       = "false"
    maximum_timeout    = "3600"
  }

}

variable "registration_token" {
  description = "gitlab registration token"
  type        = string
}
variable "aws_region" {
  type = string
}

variable "environment" {
  description = "A name that identifies the environment, will used as prefix and for tagging."
  type        = string
  default     = "runners-default"
}

variable "runner_name" {
  description = "Name of the runner, will be used in the runner config.toml"
  type        = string
}

variable "runner_description" {
  description = "This description will appear in the GitLab runner list"
  type        = string
}

variable "tag_list" {
  description = "GitLab tags for this runner"
  type        = string
}

Hopefully I'm missing something obvious, but it might be worth calling out a section in the Readme for "here's how to make your runner support dind docker builds", perhaps with separate guidance for TLS and non-.

jrr avatar Aug 25 '21 04:08 jrr

Hey @jrr I faced the same thing and this issue solved it for me!

keatmin avatar Nov 20 '21 03:11 keatmin

For anyone coming across this, we've managed to get docker builds working on the EC2 agent by adjusting the CI config to look like this:

services:
  # Configuration to support docker builds on custom EC2 runner from
  # https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27300#note_571697847
  - name: docker:dind
    command: ["--tls=false"]
variables:
  DOCKER_HOST: tcp://docker:2375/
  DOCKER_TLS_CERTDIR: ""
image: docker:20.10
script: |
  docker build (...)

But the issue's question remains: what would it take to not require the --tls=false and DOCKER_TLS_CERTDIR workarounds?

Update 3/2022

For reference, my project's latest (and working!) dind builds look like this:

docker-build:
  tags:
    - ec2-runner
  services:
    - name: docker:20.10.12-dind
      alias: docker
  variables:
    DOCKER_TLS_CERTDIR: "/certs"
  image: docker:20.10.12
  script: docker build (...)

Details from the terraform project:

gitlab_runner_version = "14.8.2"
runners_privileged       = "true"
runners_add_dind_volumes = true
runners_helper_image = "..." // (we host this ourselves, but it matches the runner version above)

jrr avatar Feb 25 '22 15:02 jrr

You can set DOCKER_TLS_CERTDIR in GitLab Runner config to default it for all jobs:

[[runners]]
  environment = ["DOCKER_TLS_CERTDIR=/certs"]

Which means with this project you can set it like so as well:

runners_environment_vars = ["DOCKER_TLS_CERTDIR=/certs"]

jonpas avatar Apr 25 '22 13:04 jonpas

Hi guys,

I have tried your configuration, but i have a warning :

WARNING: Service docker:dind is already created. Ignoring.
Waiting for services to be up and running (timeout 30 seconds)...
*** WARNING: Service runner-csrvkpu-project-2[9](https://gitlab.com/xxxxxx/xxxxx/-/jobs/xxxxxxxxxx#L9)177080-concurrent-0-0c7113aa9b112e39-docker-0 probably didn't start properly.
Health check error:
service "runner-csrvkpu-project-29177080-concurrent-0-0c7113aa9b112e39-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2022-06-20T06:48:35.784174517Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-06-20T06:48:36.233743156Z ....................................................................................++++
2022-06-20T06:48:36.457276661Z ...........................................................++++
2022-06-20T06:48:36.457737473Z e is 65537 (0x0[10](https://gitlab.com/xxxxxxxxx/xxxxxxx/-/jobs/2610950536#L10)001)
2022-06-20T06:48:36.470590018Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-06-20T06:48:37.377644029Z ....................................................................................................................................................................................................................................................++++
2022-06-20T06:48:37.391673751Z ..++++
2022-06-20T06:48:37.392155836Z e is 65537 (0x010001)
2022-06-20T06:48:37.417577680Z Signature ok
2022-06-20T06:48:37.417591929Z subject=CN = docker:dind server
2022-06-20T06:48:37.417799225Z Getting CA Private Key
2022-06-20T06:48:37.428089288Z /certs/server/cert.pem: OK
2022-06-20T06:48:37.430983684Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-06-20T06:48:37.583050969Z .......................................++++
2022-06-20T06:48:37.877757247Z ..............................................................................++++
2022-06-20T06:48:37.878241583Z e is 65537 (0x010001)
2022-06-20T06:48:37.900008285Z Signature ok
2022-06-20T06:48:37.900023204Z subject=CN = docker:dind client
2022-06-20T06:48:37.900228438Z Getting CA Private Key
2022-06-20T06:48:37.910541058Z /certs/client/cert.pem: OK
2022-06-20T06:48:37.972929108Z time="2022-06-20T06:48:37.972749288Z" level=info msg="Starting up"
2022-06-20T06:48:37.974170320Z time="2022-06-20T06:48:37.974019428Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2022-06-20T06:48:37.97418[11](https://gitlab.com/xxxxxx/xxxxxxxxx/-/jobs/2610950536#L11)90Z failed to load listeners: can't create unix socket /var/run/docker.sock: device or resource busy
*********

My terraform config :

  gitlab_runner_version = "15.0.0"
  runners_privileged         = "true"
  runners_environment_vars = ["DOCKER_DRIVER=overlay2", "DOCKER_TLS_CERTDIR=/certs"]
  runners_add_dind_volumes = true

My gitlab-ci.yml config :

image: docker:latest

services:
  - name: docker:dind
    alias: docker

Have you a solution for that ?

DnD-Magnum avatar Jun 20 '22 07:06 DnD-Magnum

Just setting runners_add_dind_volumes = true did the trick for me with 5.5.0; didn't have to change anything about GitLab services.

webyneter avatar Dec 07 '22 16:12 webyneter