gitlab-ci-local icon indicating copy to clipboard operation
gitlab-ci-local copied to clipboard

Inability to work with CI processes that use nested Docker containers

Open ttsiodras opened this issue 1 year ago • 7 comments

Minimal .gitlab-ci.yml illustrating the issue

---
job:
  stage: build
  image: DockerImageA
  script:
    - do thing 1 inside DockerImageA
    - do thing 2 inside DockerImageA
    - docker run -v $CI_PROJECT_DIR:$CI_PROJECT_DIR -e CI_PROJECT_DIR --rm -it DockerImageB /bin/bash -c 'cd $CI_PROJECT_DIR ; do stuff inside DockerImageB'

This works fine if the gitlabrunner's config.toml has the "/builds" folder and the "/var/run/docker.sock" inside the "volumes" key for the "runners.docker". The "/builds" is basically in the host filesystem, so the nested docker can access the CI_PROJECT_DIR just as much as the initial (DockerImageA) could.

Expected behavior Expected behavior is to execute commands from inside DockerImageB that can see the CI_PROJECT_DIR.

Host information Not really relevant. It's an issue with how host-based folders need to be used (bind-mounted) for this to function.

Containerd binary Docker.

Additional context I did try passing --volume /gcl-builds but I got an error that this folder is already mapped. Indeed it is, with a transient mapping; one that can't allow the nested Docker invocation to map it further inside the container made for DockerImageB.

ttsiodras avatar Jul 04 '24 16:07 ttsiodras

Can you refactor your example, so it's actually able to run on our machines, so we can get a better understanding of what the issue is?

firecow avatar Jul 05 '24 05:07 firecow

Sure - here's a set of steps:

  • First step: make a Docker image to act in place of DockerImageA:
$ cat Dockerfile
#
# Process this Dockerfile with:
#
#     docker build -t docker_image_a .
#
FROM debian:bookworm
RUN apt-get update && \
    apt-get -qy full-upgrade && \
    apt-get install -qy curl && \
    curl -sSL https://get.docker.com/ | sh

$ docker build -t docker_image_a .
  • Second step: we will use the default debian:bookworm for DockerImageB:
$ docker pull debian:bookworm
  • Third step: this .gitlab-ci.yml
$ cat  .gitlab-ci.yml
stages:
  - build

build-something:
  stage: build
  image: docker_image_a
  script:
    - pwd
    - touch a
    - touch b
    - ls -la # This shows all files; including .gitlab-ci.yml and 'a' and 'b'
    - docker run -v $PWD:/work --rm debian:bookworm /bin/bash -c "ls -la /work" # This, doesn't
  • Final step: launch gitlab-ci-local:
$ gitlab-ci-local --privileged --volume /var/run/docker.sock:/var/run/docker.sock
Using fallback git commit data
Unable to retrieve default remote branch, falling back to `main`.
Using fallback git remote data
parsing and downloads finished in 59 ms.
json schema validated in 182 ms
build-something starting docker_image_a:latest (build)
build-something copied to docker volumes in 685 ms
build-something $ pwd
build-something > /gcl-builds
build-something $ touch a
build-something $ touch b
build-something $ ls -la
build-something > total 16
build-something > drwxrwxrwx 2 root root 4096 Jul  5 09:45 .
build-something > drwxr-xr-x 1 root root 4096 Jul  5 09:45 ..
build-something > -rw-rw-rw- 1 root root  294 Jul  5 09:42 .gitlab-ci.yml
build-something > -rw-rw-rw- 1 root root  267 Jul  5 09:43 Dockerfile
build-something > -rw-r--r-- 1 root root    0 Jul  5 09:45 a
build-something > -rw-r--r-- 1 root root    0 Jul  5 09:45 b
build-something $ docker run -v $PWD:/work --rm debian:bookworm /bin/bash -c "ls -la /work"
build-something > total 8
build-something > drwxr-xr-x 2 root root 4096 Jul  5 09:42 .
build-something > drwxr-xr-x 1 root root 4096 Jul  5 09:45 ..
build-something finished in 2 s

So, the nested docker instance (debian:bookworm) is asked to map the current folder to /work; but this doesn't work.

In the Gitlab installation it does work, because the config.toml there contains a volume directive that asks for the entire /builds folder that exists on the host to be mapped to the /buids inside the build container (the docker_image_a in the example above)_. This means that it can be mapped forward to the next level nested Docker container.

What can I do to get gitlab-ci-local to end up listing the same contents as the first level container does?

Passing --volume /gcl-builds doesn't work, since /gcl-builds is already asked to be mapped transiently (--volume gcl-build-something-837138-build:/gcl-builds below)

$ gitlab-ci-local --privileged --volume /var/run/docker.sock:/var/run/docker.sock --volume /gcl-builds:/gcl-builds
Using fallback git commit data
Unable to retrieve default remote branch, falling back to `main`.
Using fallback git remote data
parsing and downloads finished in 57 ms.
json schema validated in 177 ms
build-something starting docker_image_a:latest (build)
build-something copied to docker volumes in 668 ms
Error: Command failed with exit code 1: docker create --interactive  --privileged --user 0:0 --volume gcl-build-something-837138-build:/gcl-builds --volume gcl-build-something-837138-tmp:/tmp/gitlab-ci-local-file-variables-fallback.group-fallback.project-837138 --workdir /gcl-builds --volume /var/run/docker.sock:/var/run/docker.sock --volume /gcl-builds:/gcl-builds   -e 'FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=false' \
  -e 'CI=true' \

ttsiodras avatar Jul 05 '24 09:07 ttsiodras

This is still labeled as "elaborate". Is the example I gave above sufficient, or do you need me to provide some additional information about the issue?

ttsiodras avatar Aug 09 '24 09:08 ttsiodras

Hi, I have encountered the same issue.

Lets say the repo on the host is on /home/anon/code/repo

  1. Run the CI in a container or have image: defined in .gitlab-ci.yml.
  2. Volume is created gcl-build-id-build, the repo is copied to it.
  3. Container is created and the volume is mapped to /gcl-builds although the host path is /home/anon/code/repo.
  4. The CI steps are run inside the container as defined in .gitlab-ci.yml

If any of this steps is to run a container process that maps any part of the original repo, the mapping now looks like: -v /gcl-builds/<repo-relative-path>:/<desired-nested-container-path>

But due to how docker works, the path that is being mapped is read from the host, and not from the current container instance. So the nested container, tries to map /gcl-builds/... from the host, and it doesn't resolve it to /home/anon/code/repo/...

I'm ok to contribute as I would like to have this resolved. But I want to clarify what should be the solution before starting work on it.

Solutions:

Add additional argument --use-host-folder [boolean][default: false].

  1. Map the source repo folder to the container directly Replace /gcl-builds with a variable that resolves to the git root folder This will require to skip the volume create and coping the repo to it. It might break the host ownership of the repo as there is chown -R with the container user. (So not really something I would like)

  2. Map the volume path directly Create the volume and get its path on the host for ex /var/lib/docker/volumes/gcl-build-776950-build/_data Map the path directly (or something) (not quite sure if and how will this work) But all resources are tracked and managed from the docker. Also this might be quite problematic as the repo path grows too long. For ex. on linux .sock files have full path length limit of 108.

  3. Skip the docker volume and copy the repo on the host directly: Remove the volume creation and coping the repo to it. Create folder /gcl-builds/${this.jobId} on the host and copy the repo to it. Map the /gcl-builds/${this.jobId} in the container. The chown -R will not break the ownership of the original repo. But this escapes and leave resources out of the docker env which needs to be managed/cleaned manually. If the --cleanup flag is not set.. the copied info must be deleted manually by the user. Possible permission problems if the user is not allowed to create folders in /. (Tho docker requires root privilege so...)

In all of the above the idea is to map the host path to the container directly, so any nested container run from inside, mapping the repo root path, to be the same as on the host. And I think the 3rd option is the best one, but it will require additional care to handle cleanup. Maybe an additional flag --cleanup-old to search and delete old folders in /gcl-builds.

Elsewhere I'm using the first option. I have control over the repo and the image the build is run in so the chown is not really a problem. It saves time and complexity as during development, changes can be made in the container and they will persist on the host. But if the ownership of the repo break because of CI run... I think it would be problematic. Tho, this will affect only d-in-d runs.

Let me know which one, if any, it should be, or any other ideas.

NGPetrov avatar Feb 21 '25 12:02 NGPetrov

I agree with you that option 3 is the best, and I'd love to have that functionality.

Can you elaborate on what you mean with "elsewhere I'm using the first option"? Does that mean you've already hacked gitlab-ci-local enough to be able to offer that functionality? If so, please publish your fork.

ttsiodras avatar Feb 22 '25 19:02 ttsiodras

No. I haven't patched the gitlab-ci-local. What I've meant is: In another project, where I have control over the image/container I choose the first option, so it easier to work on the codebase. The image is created on each host from scratch. The image user id (id -u) is duplicated to be the same as on the host - avoiding chown problems. This avoids unnecessary copying to 3rd folder. The runtime environment is configured from the image and container, but any changes to the repo, made in the container persist on the host and its easy just to git add && git commit if needed.

As I wrote previously I'm ok to work on this, as I am using this project and need the functionality. But first I would like one of the maintainers to elaborate which (if any) of the above methods is to be used to solve the problem.

NGPetrov avatar Feb 27 '25 10:02 NGPetrov

It can reproduce locally. Labeling as feature. We need to mimic on the local machine, what a gitlab runner does in /home/gitlab-runner/builds somehow. I can't think of an easy solution at the top of my head.

firecow avatar Mar 01 '25 09:03 firecow