terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard
dind docker builds: how to make it work like GitLab shared runners?
We're using a runner based on the runner-default example, and overall it's been great: we're able to run most* of our builds on it, and they run much faster than they did on the GitLab shared runners we were using before.
The jobs we're not yet running on the EC2 runner are those that perform Docker builds, using dind like this:
example-job:
image: docker:20.10
variables:
DOCKER_HOST: tcp://docker:2375/
services:
- docker:20.10-dind
script: |
docker build (...)
These work on the shared runners, but fail on our EC2 runners with a message like this:
Cannot connect to the Docker daemon at tcp://docker:2375/. Is the docker daemon running?
Based on my current shaky understanding of the various pieces (this project, docker-machine, GitLab runners, etc.), I think the important question is:
How do I configure an AWS runner to be able to perform dind docker builds the same way as GitLab's runners?
Would you expect the above job to work on a runner provisioned with the runner-default example? (if so, then I'll more closely diff my config against that)
Do I need to set any specific inputs?
Here's my current configuration:
main.tf
data "aws_availability_zones" "available" {
state = "available"
}
data "aws_security_group" "default" {
name = "default"
vpc_id = module.vpc.vpc_id
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "2.70"
name = "vpc-${var.environment}"
cidr = "10.0.0.0/16"
azs = [data.aws_availability_zones.available.names[0]]
private_subnets = ["10.0.1.0/24"]
public_subnets = ["10.0.101.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
enable_s3_endpoint = true
tags = {
Environment = var.environment
}
}
module "gitlab-runner" {
# https://registry.terraform.io/modules/npalm/gitlab-runner/aws/
source = "npalm/gitlab-runner/aws"
version = "4.27.0"
aws_region = var.aws_region
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_ids_gitlab_runner = module.vpc.private_subnets
subnet_id_runners = element(module.vpc.private_subnets, 0)
runners_name = var.runner_name
runners_gitlab_url = "https://gitlab.com"
gitlab_runner_registration_config = {
registration_token = var.registration_token
tag_list = var.tag_list
description = var.runner_description
locked_to_project = "true"
run_untagged = "false"
maximum_timeout = "3600"
}
}
variable "registration_token" {
description = "gitlab registration token"
type = string
}
variable "aws_region" {
type = string
}
variable "environment" {
description = "A name that identifies the environment, will used as prefix and for tagging."
type = string
default = "runners-default"
}
variable "runner_name" {
description = "Name of the runner, will be used in the runner config.toml"
type = string
}
variable "runner_description" {
description = "This description will appear in the GitLab runner list"
type = string
}
variable "tag_list" {
description = "GitLab tags for this runner"
type = string
}
Hopefully I'm missing something obvious, but it might be worth calling out a section in the Readme for "here's how to make your runner support dind docker builds", perhaps with separate guidance for TLS and non-.
Hey @jrr I faced the same thing and this issue solved it for me!
For anyone coming across this, we've managed to get docker builds working on the EC2 agent by adjusting the CI config to look like this:
services:
# Configuration to support docker builds on custom EC2 runner from
# https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27300#note_571697847
- name: docker:dind
command: ["--tls=false"]
variables:
DOCKER_HOST: tcp://docker:2375/
DOCKER_TLS_CERTDIR: ""
image: docker:20.10
script: |
docker build (...)
But the issue's question remains: what would it take to not require the --tls=false
and DOCKER_TLS_CERTDIR
workarounds?
Update 3/2022
For reference, my project's latest (and working!) dind builds look like this:
docker-build:
tags:
- ec2-runner
services:
- name: docker:20.10.12-dind
alias: docker
variables:
DOCKER_TLS_CERTDIR: "/certs"
image: docker:20.10.12
script: docker build (...)
Details from the terraform project:
gitlab_runner_version = "14.8.2"
runners_privileged = "true"
runners_add_dind_volumes = true
runners_helper_image = "..." // (we host this ourselves, but it matches the runner version above)
You can set DOCKER_TLS_CERTDIR
in GitLab Runner config to default it for all jobs:
[[runners]]
environment = ["DOCKER_TLS_CERTDIR=/certs"]
Which means with this project you can set it like so as well:
runners_environment_vars = ["DOCKER_TLS_CERTDIR=/certs"]
Hi guys,
I have tried your configuration, but i have a warning :
WARNING: Service docker:dind is already created. Ignoring.
Waiting for services to be up and running (timeout 30 seconds)...
*** WARNING: Service runner-csrvkpu-project-2[9](https://gitlab.com/xxxxxx/xxxxx/-/jobs/xxxxxxxxxx#L9)177080-concurrent-0-0c7113aa9b112e39-docker-0 probably didn't start properly.
Health check error:
service "runner-csrvkpu-project-29177080-concurrent-0-0c7113aa9b112e39-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2022-06-20T06:48:35.784174517Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-06-20T06:48:36.233743156Z ....................................................................................++++
2022-06-20T06:48:36.457276661Z ...........................................................++++
2022-06-20T06:48:36.457737473Z e is 65537 (0x0[10](https://gitlab.com/xxxxxxxxx/xxxxxxx/-/jobs/2610950536#L10)001)
2022-06-20T06:48:36.470590018Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-06-20T06:48:37.377644029Z ....................................................................................................................................................................................................................................................++++
2022-06-20T06:48:37.391673751Z ..++++
2022-06-20T06:48:37.392155836Z e is 65537 (0x010001)
2022-06-20T06:48:37.417577680Z Signature ok
2022-06-20T06:48:37.417591929Z subject=CN = docker:dind server
2022-06-20T06:48:37.417799225Z Getting CA Private Key
2022-06-20T06:48:37.428089288Z /certs/server/cert.pem: OK
2022-06-20T06:48:37.430983684Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-06-20T06:48:37.583050969Z .......................................++++
2022-06-20T06:48:37.877757247Z ..............................................................................++++
2022-06-20T06:48:37.878241583Z e is 65537 (0x010001)
2022-06-20T06:48:37.900008285Z Signature ok
2022-06-20T06:48:37.900023204Z subject=CN = docker:dind client
2022-06-20T06:48:37.900228438Z Getting CA Private Key
2022-06-20T06:48:37.910541058Z /certs/client/cert.pem: OK
2022-06-20T06:48:37.972929108Z time="2022-06-20T06:48:37.972749288Z" level=info msg="Starting up"
2022-06-20T06:48:37.974170320Z time="2022-06-20T06:48:37.974019428Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2022-06-20T06:48:37.97418[11](https://gitlab.com/xxxxxx/xxxxxxxxx/-/jobs/2610950536#L11)90Z failed to load listeners: can't create unix socket /var/run/docker.sock: device or resource busy
*********
My terraform config :
gitlab_runner_version = "15.0.0"
runners_privileged = "true"
runners_environment_vars = ["DOCKER_DRIVER=overlay2", "DOCKER_TLS_CERTDIR=/certs"]
runners_add_dind_volumes = true
My gitlab-ci.yml config :
image: docker:latest
services:
- name: docker:dind
alias: docker
Have you a solution for that ?
Just setting runners_add_dind_volumes = true
did the trick for me with 5.5.0
; didn't have to change anything about GitLab services
.