terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard
volume pb with dind
Hi
First, I'm not sure at all that the issue is related to this module, it could (may) very well be a gitlab issue, but i'm checking out here before.
I'm trying to upgrade my gitlab runner that I installed with this module before : version 4.23.0 with gitlab version 13.4.1 now : version 5.1.0 with gitlab version 14.8.3 , but i'm having troubles with dind volumes.
Here's my new module conf :
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "2.70"
name = "vpc-gitlab-ci"
cidr = "10.0.0.0/16"
azs = var.vpc_availability_zones
private_subnets = ["10.0.1.0/24","10.0.2.0/24","10.0.3.0/24"]
public_subnets = ["10.0.101.0/24","10.0.102.0/24","10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
enable_s3_endpoint = true
tags = {
Name = "vpc-gitlab-ci"
}
}
module "gitlab-ci-c6axlarge" {
source = "github.com/npalm/terraform-aws-gitlab-runner?ref=5.1.0"
aws_region = eu-west-1
environment = "gitlab-ci-c6axlarge"
runners_use_private_address = false
vpc_id = module.vpc.vpc_id
subnet_id = element(module.vpc.public_subnets, 1)
runners_name = "gitlab-ci-c6axlarge"
runners_gitlab_url = "https://gitlab.com"
gitlab_runner_registration_config = {
registration_token = "bl4bl4-ma"
tag_list = "gitlab-ci-c6axlarge"
description = "runner docker - auto"
locked_to_project = "false"
run_untagged = "false"
maximum_timeout = "3600"
}
docker_machine_instance_type = "c6a.xlarge"
log_group_name = "gitlab-ci-c6axlarge"
enable_cloudwatch_logging = true
runners_request_spot_instance = false
docker_machine_spot_price_bid = "0.384"
cache_bucket_prefix = "gitlab-ci-c6axlarge"
cache_bucket_name_include_account_id = false
runners_environment_vars = [
"DOCKER_AUTH_CONFIG={\"auths\":{\"index.docker.io\":{\"auth\":\"BL4bl4\"}}}"
]
enable_runner_ssm_access = true
agent_tags = {
Product = "ci"
Type = "gitlab-ci-c6axlarge"
}
runner_tags = {
Product = "ci"
Type = "gitlab-ci-c6axlarge"
}
overrides = { "name_runner_agent_instance": "gitlab-ci-c6axlarge-agent", "name_docker_machine_runners": "gitlab-ci-c6axlarge-runner", "name_sg": "" }
runners_add_dind_volumes = true
docker_machine_iam_policy_arns = [aws_iam_policy.gitlab_ci_docker_machine_additionnal_policy.arn]
runners_root_size = 100
docker_machine_instance_metadata_options = { "http_put_response_hop_limit": 5, "http_tokens": "optional" }
runner_instance_metadata_options = { "http_endpoint": "enabled", "http_put_response_hop_limit": 5, "http_tokens": "optional", "instance_metadata_tags": "disabled" }
My gitlabCI file :
jmtest:
stage: jmtest
tags:
- gitlab-ci-c6axlarge
image: docker:20.10.16
services:
- docker:dind
variables:
FF_NETWORK_PER_BUILD: "true" # activate container-to-container networking
script:
- docker run --rm --name jmtest --volume "$(pwd)":"$(pwd)" --workdir "$(pwd)" --network=host debian find $(pwd)
- sleep 300
The output on the old runner :
docker run --rm --name jmtest --volume "$(pwd)":"$(pwd)" --workdir "$(pwd)" --network=host debian find $(pwd)
Unable to find image 'debian:latest' locally
latest: Pulling from library/debian
1671565cc8df: Pulling fs layer
1671565cc8df: Verifying Checksum
1671565cc8df: Download complete
1671565cc8df: Pull complete
Digest: sha256:d52921d97310d0bd48dab928548ef539d5c88c743165754c57cfad003031386c
Status: Downloaded newer image for debian:latest
/builds/me/web/infra
/builds/me/web/infra/keys.tf
/builds/me/web/infra/main.tf
(...)
$ sleep 300
The output on the new one :
docker run --rm --name jmtest --volume "$(pwd)":"$(pwd)" --workdir "$(pwd)" --network=host debian find $(pwd)
Unable to find image 'debian:latest' locally
latest: Pulling from library/debian
1671565cc8df: Pulling fs layer
1671565cc8df: Verifying Checksum
1671565cc8df: Download complete
1671565cc8df: Pull complete
Digest: sha256:d52921d97310d0bd48dab928548ef539d5c88c743165754c57cfad003031386c
Status: Downloaded newer image for debian:latest
/builds/me/web/infra
$ sleep 300
My gitlab repo files aren't there in the dind container
The same file works fine with my older runner
I've been searching for days now and am out of ideas of where to look.
Ok, I finally found what is wrong.
Since version 4.36, you added the "runners_add_dind_volumes" option to match the TLS change introduced by docker 19.03 It does mount the needed certificate volume but also "/var/run/docker.sock" systematically, hence switching from a DinD approach, to a DooD (Docker outside of Docker) approach, changing how the gitlab repo files should be accessed from a docker-run within the CI
Just explaining here in case someone's struggling like I did.
With DooD :
- The job spawns an instance via docker-machine which will become : the host
- The instance spawns a container (let's call it Alpha) with the image asked for in the CI file to execute the commands
- one of the commands execute a docker run, creating another container (Beta) on the host since we're using DooD
- My gitlab repo is available on /builds in Alpha, and if I need it in Beta, I need to use the host's path for the -v option, not Alpha's path, because the docker command will be executed by the host, no by alpha. But what is the host's path ? Let's check :
root@runner-kgl2jxg-runner-1662023149-189796a6:/home/ubuntu# locate myrepo
/var/lib/docker/volumes/runner-kgl2jxg-project-19427459-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data/mycompany/web/myrepo.tmp/git-template
/var/lib/docker/volumes/runner-kgl2jxg-project-19427459-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data/mycompany/web/myrepo.tmp/git-template/config
root@runner-kgl2jxg-runner-1662023149-189796a6:/home/ubuntu# docker volume ls
DRIVER VOLUME NAME
local runner-kgl2jxg-project-19427459-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70
local runner-kgl2jxg-project-19427459-concurrent-0-cache-904f6ed42e0fa2b14c1d7a2ed6f1875e
local runner-kgl2jxg-project-19427459-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8
- Since the gitlab repo is a docker volume created by the CI, I use a --volumes-from Alpha when I launch Beta :
docker run --volumes-from $(docker ps | grep runner | awk '{print $1}') ubuntu echo success
I'm sure most of you knew the difference enough between DooD and DinD but it wasn't my case and I just leave this here in case someone's in the same situation one day.
Is this worth being documented somewhere ?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
@falcocoris Could you please add a short documentation for your solution in the README.md? I think it is worth to explain it for other users.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity.
Needs to be documented
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity.