runner
runner copied to clipboard
Support Runner inside of Docker Container
Describe the enhancement
Fully support all features when runner is within a Docker container.
Not all features are currently supported when the runner is within a Docker container, specifically those features that use Docker like Docker-based Actions and services
. Running self-hosted runners using Docker is an easy way to scale out runners on some sort of Docker-based cluster and an easy way to provide clean workspaces for each run (with ./run.sh --once
).
Code Snippet
Possible implementation that I am using now.
Additional information
There are a few areas of concern when the runner executes in a Docker container:
- Filesystem access for other containers needed as part of the job. This can be resolved by using a volume mount from the host which uses a matching host and container path (for example:
docker run -v /home/github:/home/github
, although it doesn't have to be this exact directory) and telling the runner to use a directory within that for the work directory (./config.sh --work /home/github/work
). This works with the current volume mounting behaviour for containers created by the runner. This would need to be documented as part of the setup process for a Docker-based runner. - Network between runner and other containers needed as part of the job. This could be resolved by not creating a network as part of the run and instead optionally accepting an existing network to be used. I have found that it works well to use
--network container:<container ID of the runner>
to reuse the network from the runner container without having to orchestrate a network created viadocker network create
. There is no straightforward way to discover the network or ID of a container from within it, so it would likely need to be the responsibility of the user to pass this information to the runner (I current do something like"container:$(cat /proc/self/cgroup | grep "cpu" | head -n 1 | rev | cut -d/ -f 1 | rev)"
from within the runner container to find the ID and pass this to the runner, although this isn't guaranteed to work in all cases).
There appear to be a couple more things that need to be done to account for multiple runners on the same host concurrently:
-
docker network prune
can not run concurrently and should likely be retried if such an error is received:/usr/local/bin/docker network prune --force --filter "label=898d1dec6adc" Error response from daemon: a prune operation is already running ##[warning]Delete stale container networks failed, docker network prune fail with exit code 1
- The docker label is not sufficient for isolating separate runners on the same host. The current hash of the root directory will result in the same label being used for all runners with the exact same version. In my testing I've switched this to use the hostname, but perhaps something like the runner name or run ID could be used.
@TingluoHuang @bryanmacfarlane I'm hoping to get your feedback on this – getting official support for this would be a huge help for me. I'm happy to work on an implementation if that is helpful.
This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.
This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.
Fully agree, for the time being we have build a scalable solution on AWS spot to server our docker builds. A detailed blog post en ref to the code https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners
This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.
Currently we have to create workarounds using non-optimal solutions to deploy tens of runners - or workarounds for container usage in jobs which is rather ugly. How to raise priority of this ?
Just curious, how other's manage scaling the runners ? This is probably one of the most interested approach so far I've seen.. I guess many of us faced this same challenge when scaling gh-runners.. "Official" scaling proposals from GitHub would be more than welcome :) .
This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.
Big kudos for @npalm and its solution on AWS. We also build a similar solution for GCP allowing us to scale our self hosted runners for a whole GitHub organization https://github.com/faberNovel/terraform-gcp-github-runner
@jupe we use https://github.com/summerwind/actions-runner-controller which has worked really well for us so far
waiting for this one so bad
@jpb Is there any possibility to get a higher priority on this?
For now we will use some workaround based on docker-compose. We have the following docker-compose.yaml file in the repo to setup the services.
version: "3.3"
services:
nginx:
image: nginx
redis:
image: redis
We then connect the self-hosted runner which is also running inside docker with the created network. This is one example WF:
name: Start docker compose
on:
workflow_dispatch:
jobs:
start-docker-compose:
# should run as docker container with connection to the dockerd of the host
runs-on: [self-hosted]
steps:
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 1
- name: Start docker compose
id: start-docker-compose
run: |
project_prefix=my-project
project_name="${project_prefix}-$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13 | tr '[:upper:]' '[:lower:]')"
my_container_id=$(grep docker /proc/self/cgroup | head -n 1 | sed "s|^.*/docker/\(.*\)|\\1|")
docker-compose -p "${project_name}" up -d
while ! docker network inspect "${project_name}_default" > /dev/null ; do
sleep 1
done
docker network connect "${project_name}_default" "${my_container_id}"
echo "::set-output name=my_container_id::$my_container_id"
echo "::set-output name=project_name::$project_name"
- name: Check output
run: |
echo "Project name: ${{ steps.start-docker-compose.outputs.project_name}}"
echo "Container ID: ${{ steps.start-docker-compose.outputs.remote_container_id}}"
- name: Use the started docker compose services
run: |
# Install netcat to check redis
apt-get update
apt-get install -y netcat
echo '--------------------'
ping -c 1 nginx
curl nginx
echo '--------------------'
ping -c 1 redis
echo ping | netcat -w 2 redis 6379
- name: Cleanup started docker compose services
if: always()
run: |
docker network disconnect ${{ steps.start-docker-compose.outputs.project_name}}_default ${{ steps.start-docker-compose.outputs.my_container_id}}
docker-compose -p ${{ steps.start-docker-compose.outputs.project_name}} down
@jpb , since you asked me, I'm ➕ on this, But adding @hross to weigh in since he's driving the runner area now. 🚀
We still want to do this and it's on our list but we don't have a date or schedule for shipping this type of feature right now.
Would love to see this prioritized. Can't really run docker-in-docker on Kubernetes self-hosted runners without this.
Any update on this issue?
Ping @bryanmacfarlane =)
Plus one here, any update ETA? @hross
Another +1 here, for me this is blocking some 3rd-party deployment workflows with private AKS clusters
another +1 again, waiting for this feature so badly !
+1
@hross at least according to this thread, you are the most recent driver on this topic. Has anything changed since your July 2021 post regarding prioritisation? Thanks for your assistance!
Just started to look into Runners within Containers. Seems that full support would be a great help.
This issue has been opened for 2 years now, is there any plan to support it? It seems like a lot of people are requesting it
@nikola-jokic @fhammerl @TingluoHuang @ruvceskistefan @thboop may we have someone assigned to this issue, to at least take a decision about what is going to happen? Sorry for the ping but something must be done here IMO
A workaround is to do all into a docker-compose and then use the option --exit-code-from
available into the docker-compose command
--exit-code-from: Return the exit code of the selected service container. Implies --abort-on-container-exit
For example (need modifications) :
docker-compose :
version: "3"
services:
postgresql:
image: postgres:latest
environment:
- POSTGRES_DB=...
- POSTGRES_USER=...
- POSTGRES_PASSWORD=...
server:
build: .
volumes:
- src:/src
- ./cache/gradle:/root/.gradle
environment:
- DB_HOST=postgresql
- DB_NAME=...
- DB_USER=...
- DB_PASSWORD=...
command: ["bash", "-c", "./wait-for-it.sh -t 0 postgresql:5432 -- ./gradlew test --info"]
Workflow :
name: Server QA
on:
push:
jobs:
test:
runs-on: self-hosted
steps:
- name: Run tests
run: docker-compose up --exit-code-from server --force-recreate --build
This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.
Fully agree, for the time being we have build a scalable solution on AWS spot to server our docker builds. A detailed blog post en ref to the code https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners
Makes sense, I did something similar for DNS Resolver ENIS with Cloudwatch as inputs.
The main issue I see with all this is access to docker.sock...the whole docker in docker with root access scenario. https://github.com/myoung34/docker-github-actions-runner/issues/61 From the examples mentioned here: https://github.com/actions/runner/issues/367#issuecomment-597742895 You can try rootless, https://docs.docker.com/engine/security/rootless/#rootless-docker-in-docker
But then you run into limitations. So a docker container "runner" running into another docker rootless container inside your typical rooted docker, can you even do docker build then? Buildkit: https://www.containiq.com/post/docker-alternatives
And apparently some things have stopped working: https://github.com/actions/runner/issues/2103
And then again, at the end of the day, you'll want container orchestration to bring your runners up and down.
Can your actions consider docker alternatives to build images with a container runner?
https://snyk.io/blog/building-docker-images-kubernetes/
For our team specifically we do want container runners that are also able to run containers.
Any update on this or new news?
Also need to plus one this issue. I've tried every workaround including the latest changes to
https://github.com/actions/runner/blob/main/images/Dockerfile by the @TingluoHuang and the team, but having the CLI isn't really too much use unless we can run docker pull xxx
. More specifically when anyone is developing actions they have to be hyper aware of what the action is written in.
I really want to run my jobs using that container feature :(
...
jobs:
job:
runs-on:
labels:
- self-hosted
- linux
- ${{ inputs.RUNNER_LABEL }}
container:
image: ${{ inputs.DOCKER_IMAGE }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
steps:
- run: |
sfdx version --json
Since I can't execute jobs that run inside containers, and my workflows don't need any other dockerized services, I can get over this limitation by just creating PODs with a docker image that has the runner + everything else that my docker image has that is necessary to run the job. The downside is that I need to create a new image. The following image shows exactly what I'm thinking about:

OBS: I don't need more than one node. If an entire node goes down, I can just wait for EKS to recreate it, as well as its PODs
With the workaround architecture in place, I can then remove the container
configuration from the job manifest.
...
jobs:
job:
runs-on:
labels:
- self-hosted
- linux
- ${{ inputs.RUNNER_LABEL }}
steps:
- run: |
sfdx version --json
OBS: I'm just not sure if the storage is going to, somehow, be shared by the PODs or if they are unique, even when using the same name. If the storage is shared between PODs, then one job could impact another one if both PODs run on the same Node.
I have a POD running a github-runner and a dind container. When a job that runs on a container is taken by the github-runner container, the job can't execute a simple inline script such as echo hello
. Am I doing something wrong, or is it a problem caused by this issue as stated in this other issue?
In the following image you can see that the Container is created by the dind container without a problem, but the inline script can't be executed inside the container

This is my kubernetes deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-public
namespace: github-runners
labels:
app: github-runner-public
spec:
replicas: 1
selector:
matchLabels:
app: github-runner-public
template:
metadata:
labels:
app: github-runner-public
spec:
nodeSelector:
eks.amazonaws.com/nodegroup: public-nodegroup
containers:
- name: github-runner
image: 225537886698.dkr.ecr.eu-west-1.amazonaws.com/github-runner-test:v1.1.3
env:
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: DOCKER_API_VERSION
value: "1.42"
volumeMounts:
- name: runner-workspace
mountPath: /actions-runner/_work
- name: dind-daemon
image: docker:23.0.5-dind
command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
securityContext:
privileged: true
volumeMounts:
- name: docker-graph-storage
mountPath: /var/lib/docker
volumes:
- name: docker-graph-storage
emptyDir: {}
- name: runner-workspace
emptyDir: {}
Why/how can't the container that runs inside the dind not have access to /actions-runner/_work/_temp
from github-runner
? I don't get it.
After reading this post I understood that the directories from the container inside the dind would be mapped to the directories inside the github-runner. So, If the container created by dind is mapping /actions-runner/_work
from the github-runner container to /__w
that is inside the container as shown below, why isn't the /__w/__temp/<bla>.sh
available?
/actions-runner/_work (volume in the node) <- github-runer [/actions-runner/_work] -> (dind) -> my-container [/__w]
Shouln't /actions-runner/_work/_temp/<bla>.sh
be inside the volume?

This is the content of the /actions-runner/_work/_temp
directory inside the github-runner
container. For some reason it is empty. Does this mean that the controller can't create the inline script inside the runner when it is running in a container?