terraform-aws-gitlab-runner icon indicating copy to clipboard operation
terraform-aws-gitlab-runner copied to clipboard

volume pb with dind

Open falcocoris opened this issue 3 years ago • 6 comments
trafficstars

Hi

First, I'm not sure at all that the issue is related to this module, it could (may) very well be a gitlab issue, but i'm checking out here before.

I'm trying to upgrade my gitlab runner that I installed with this module before : version 4.23.0 with gitlab version 13.4.1 now : version 5.1.0 with gitlab version 14.8.3 , but i'm having troubles with dind volumes.

Here's my new module conf :

module "vpc" {
  source             = "terraform-aws-modules/vpc/aws"
  version            = "2.70"
  name               = "vpc-gitlab-ci"
  cidr               = "10.0.0.0/16"
  azs                = var.vpc_availability_zones
  private_subnets    = ["10.0.1.0/24","10.0.2.0/24","10.0.3.0/24"]
  public_subnets     = ["10.0.101.0/24","10.0.102.0/24","10.0.103.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
  enable_s3_endpoint = true

  tags = {
    Name = "vpc-gitlab-ci"
  }
}

module "gitlab-ci-c6axlarge" {
  source = "github.com/npalm/terraform-aws-gitlab-runner?ref=5.1.0"
  aws_region  = eu-west-1
  environment = "gitlab-ci-c6axlarge"
  runners_use_private_address = false

  vpc_id    = module.vpc.vpc_id
  subnet_id = element(module.vpc.public_subnets, 1)
  runners_name       = "gitlab-ci-c6axlarge"
  runners_gitlab_url = "https://gitlab.com"
  gitlab_runner_registration_config = {
    registration_token = "bl4bl4-ma"
    tag_list           = "gitlab-ci-c6axlarge"
    description        = "runner docker - auto"
    locked_to_project  = "false"
    run_untagged       = "false"
    maximum_timeout    = "3600"
  }
  docker_machine_instance_type = "c6a.xlarge"
  log_group_name = "gitlab-ci-c6axlarge"
  enable_cloudwatch_logging = true
  runners_request_spot_instance = false
  docker_machine_spot_price_bid = "0.384"
  cache_bucket_prefix                  = "gitlab-ci-c6axlarge"
  cache_bucket_name_include_account_id = false
  runners_environment_vars = [
    "DOCKER_AUTH_CONFIG={\"auths\":{\"index.docker.io\":{\"auth\":\"BL4bl4\"}}}"
  ]
  enable_runner_ssm_access = true
  agent_tags = {
    Product = "ci"
    Type    = "gitlab-ci-c6axlarge"
  }
  runner_tags = {
    Product = "ci"
    Type    = "gitlab-ci-c6axlarge"
  }
  overrides = { "name_runner_agent_instance": "gitlab-ci-c6axlarge-agent", "name_docker_machine_runners": "gitlab-ci-c6axlarge-runner", "name_sg": "" }
  runners_add_dind_volumes = true
  docker_machine_iam_policy_arns = [aws_iam_policy.gitlab_ci_docker_machine_additionnal_policy.arn]
  runners_root_size = 100
  docker_machine_instance_metadata_options = { "http_put_response_hop_limit": 5, "http_tokens": "optional" }
  runner_instance_metadata_options = { "http_endpoint": "enabled", "http_put_response_hop_limit": 5, "http_tokens": "optional", "instance_metadata_tags": "disabled" }

My gitlabCI file :

jmtest:
  stage: jmtest
  tags:
    - gitlab-ci-c6axlarge
  image: docker:20.10.16
  services:
    - docker:dind
  variables:
    FF_NETWORK_PER_BUILD: "true"     # activate container-to-container networking
  script:
    - docker run --rm --name jmtest --volume  "$(pwd)":"$(pwd)" --workdir "$(pwd)" --network=host debian find $(pwd)
    - sleep 300

The output on the old runner :

 docker run --rm --name jmtest --volume  "$(pwd)":"$(pwd)" --workdir "$(pwd)" --network=host debian find $(pwd)
Unable to find image 'debian:latest' locally
latest: Pulling from library/debian
1671565cc8df: Pulling fs layer
1671565cc8df: Verifying Checksum
1671565cc8df: Download complete
1671565cc8df: Pull complete
Digest: sha256:d52921d97310d0bd48dab928548ef539d5c88c743165754c57cfad003031386c
Status: Downloaded newer image for debian:latest
/builds/me/web/infra
/builds/me/web/infra/keys.tf
/builds/me/web/infra/main.tf
(...)
$ sleep 300

The output on the new one :

 docker run --rm --name jmtest --volume  "$(pwd)":"$(pwd)" --workdir "$(pwd)" --network=host debian find $(pwd)
Unable to find image 'debian:latest' locally
latest: Pulling from library/debian
1671565cc8df: Pulling fs layer
1671565cc8df: Verifying Checksum
1671565cc8df: Download complete
1671565cc8df: Pull complete
Digest: sha256:d52921d97310d0bd48dab928548ef539d5c88c743165754c57cfad003031386c
Status: Downloaded newer image for debian:latest
/builds/me/web/infra
$ sleep 300

My gitlab repo files aren't there in the dind container

The same file works fine with my older runner

I've been searching for days now and am out of ideas of where to look.

falcocoris avatar Aug 23 '22 15:08 falcocoris

Ok, I finally found what is wrong.

Since version 4.36, you added the "runners_add_dind_volumes" option to match the TLS change introduced by docker 19.03 It does mount the needed certificate volume but also "/var/run/docker.sock" systematically, hence switching from a DinD approach, to a DooD (Docker outside of Docker) approach, changing how the gitlab repo files should be accessed from a docker-run within the CI

Just explaining here in case someone's struggling like I did.

With DooD :

  1. The job spawns an instance via docker-machine which will become : the host
  2. The instance spawns a container (let's call it Alpha) with the image asked for in the CI file to execute the commands
  3. one of the commands execute a docker run, creating another container (Beta) on the host since we're using DooD
  4. My gitlab repo is available on /builds in Alpha, and if I need it in Beta, I need to use the host's path for the -v option, not Alpha's path, because the docker command will be executed by the host, no by alpha. But what is the host's path ? Let's check :
root@runner-kgl2jxg-runner-1662023149-189796a6:/home/ubuntu# locate myrepo
/var/lib/docker/volumes/runner-kgl2jxg-project-19427459-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data/mycompany/web/myrepo.tmp/git-template
/var/lib/docker/volumes/runner-kgl2jxg-project-19427459-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data/mycompany/web/myrepo.tmp/git-template/config

root@runner-kgl2jxg-runner-1662023149-189796a6:/home/ubuntu# docker volume ls
DRIVER    VOLUME NAME
local     runner-kgl2jxg-project-19427459-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70
local     runner-kgl2jxg-project-19427459-concurrent-0-cache-904f6ed42e0fa2b14c1d7a2ed6f1875e
local     runner-kgl2jxg-project-19427459-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8
  1. Since the gitlab repo is a docker volume created by the CI, I use a --volumes-from Alpha when I launch Beta : docker run --volumes-from $(docker ps | grep runner | awk '{print $1}') ubuntu echo success

I'm sure most of you knew the difference enough between DooD and DinD but it wasn't my case and I just leave this here in case someone's in the same situation one day.

Is this worth being documented somewhere ?

falcocoris avatar Sep 01 '22 09:09 falcocoris

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions[bot] avatar Mar 02 '23 03:03 github-actions[bot]

@falcocoris Could you please add a short documentation for your solution in the README.md? I think it is worth to explain it for other users.

kayman-mk avatar Mar 02 '23 10:03 kayman-mk

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions[bot] avatar Jan 16 '24 02:01 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

github-actions[bot] avatar Feb 01 '24 02:02 github-actions[bot]

Needs to be documented

kayman-mk avatar Feb 22 '24 09:02 kayman-mk

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions[bot] avatar Apr 23 '24 02:04 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

github-actions[bot] avatar May 09 '24 02:05 github-actions[bot]