actions-runner-controller
actions-runner-controller copied to clipboard
containers create by Github workflow have wrong dockerMTU
Describe the bug No network possible from Docker containers created by Github workflows (dind - Docker in Docker)
To Reproduce Steps to reproduce the behavior:
- create workflow with docker image, e.g.
name: Setup Go
on:
workflow_dispatch:
jobs:
setupgo:
runs-on: [kubernetes]
container:
image: 'ubuntu:latest'
steps:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.17
- name: sleep
shell: bash
run: sleep 300
- exec into the dind sidecar container and confirm that mtu parameter is propagated properly from runner deployment spec:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: my-runner
spec:
template:
spec:
organization: abc
ephemeral: true
dockerMTU: 1440
$ kubectl exec -it my-runner-xyz -c docker -- /bin/sh
$ ps auxww
1 root 0:00 docker-init -- dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem --mtu 1440
53 root 0:03 dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem --mtu 1440
61 root 0:04 containerd --config /var/run/docker/containerd/containerd.toml --log-level info
...
$ ip a
...
4: eth0@if700: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP
link/ether d2:ee:d5:2f:04:ae brd ff:ff:ff:ff:ff:ff
inet 100.96.2.200/32 brd 100.96.2.200 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::d0ee:d5ff:fe2f:4ae/64 scope link
valid_lft forever preferred_lft forever
- run the workflow and exec into the workflow docker container
$ kubectl exec -it my-runner-xyz -c docker -- /bin/sh
$ docker ps
$ docker exec -it <containerID> bash
$ apt update
$ apt install curl -y
$ curl -vOL https://github.com/actions/go-versions/releases/download/1.17.6-1668090892/go-1.17.6-linux-x64.tar.gz
0 0 0 0 0 0 0 0 --:--:-- 0:00:54 --:--:-- 0
-> nothing is happening...
$ apt install iproute2 -y
$ ip a
...
eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
-> wrong MTU
Expected behavior I would expect that downloading files should work from docker containers. If I manually start a docker container from the dind sidecar container -> no network problems:
$ kubectl exec -it my-runner-xyz -c docker -- /bin/sh
$ docker run -it --rm ubuntu bash
$ apt update
$ apt install curl -y
$ curl -vOL https://github.com/actions/go-versions/releases/download/1.17.6-1668090892/go-1.17.6-linux-x64.tar.gz
100 128M 100 128M 0 0 22.9M 0 0:00:05 0:00:05 --:--:-- 23.7M
--> download completed
$ apt install iproute2 -y
$ ip a
...
eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440
-> correct MTU
Environment:
- Helm Chart Version 0.14
- Controller Version: 0.20.2
- Deployment Method: Helm + Kustomize + fluxV2
- Kubernetes: ip-in-ip Calico on AWS
@erichorwath Hey. Please set dockerMTU appropriately as documented in https://github.com/actions-runner-controller/actions-runner-controller#additional-tweaks. Reading through all the docker-mtu related issues we had experienced https://github.com/actions-runner-controller/actions-runner-controller/issues?q=label%3A%22docker+mtu+issue%22+is%3Aclosed would also help.
@mumoshu thanks for your quick reply! I have set the dockerMTU based on the documentation, which works perfectly fine EXCEPT for workflows where "container" is specified in the jobs section as described above. Unfortunately our infrastructure does not allow us to have 1500 as the host network MTU, so we have to rely on a proper dockerMTU propagation...
@erichorwath Hey. Thanks for your confirmation. For the remaining MTU issue for the service containers, the only rescue I'm aware of is the create a wrapper script for docker network create
which is used for setting up docker networks for those service containers. It seems like it's a docker issue that it doesn't inherit MTU setting from the daemon to the custom docker networks...
https://github.com/actions-runner-controller/actions-runner-controller/issues/848#issuecomment-929394653
@mumoshu oh, that explains the problem!
Which part of the coding is able to parse the correct MTU to the docker network create
command? Is this part of Github Runner behaviour or the action-runner-controller go code?
@erichorwath The problem is that neither actions runner nor docker sets the proper MTU on docker network create
. As a workaround, you'd need to create a custom script named docker
and put it in $PATH
within runner containers, so that it is called by the actions runner process.
In the custom script, you auto-add MTU if the sub-command is network create
, and just proxy all the args as-is to the original docker
command for other cases.
That's what's the script introduced at https://github.com/actions-runner-controller/actions-runner-controller/issues/848#issuecomment-929394653 is doing.
Our runner image is build from https://github.com/actions-runner-controller/actions-runner-controller/blob/master/runner/Dockerfile. You'd need to add the custom script to the Dockerfile and build your own runner image out of it. A custom runner image can be specified via controller-manager's --runner-image
flag or
RunnerDeployment.Spec.Template.Image
. Probably the latter is easier for you.
Hello,
I've come across this issue as well since the Kubernetes network I'm setting up uses a smaller MTU (1420 instead of the usual 1500).
Despite the MTU being configurable with the dockerMTU
setting on the Runner
spec, it only affects the default docker network and does not propagate to networks created by the Github runner. This poses a problem when running workflows that use containers such as the actions/checkout@v2
one.
So, I followed-up on the workaround proposed by @FalconerTC here and changed it to support the sidecar dind container since /etc/docker/daemon.json
is not available:
#!/usr/bin/env bash
# Inspired by https://github.com/actions-runner-controller/actions-runner-controller/issues/848#issuecomment-929394653
# Inspired by https://github.com/actions/runner/issues/775#issuecomment-927826684
if [[ $1 = "network" ]] && [[ $2 = "create" ]] ; then
shift; shift #pop 2 first parameters
MTU=$(docker network inspect bridge --format '{{index .Options "com.docker.network.driver.mtu"}}' 2>/dev/null);
if [[ ! -z "$MTU" ]]; then
/usr/local/bin/docker.bin network create --opt com.docker.network.driver.mtu=$MTU "${@}"
else
/usr/local/bin/docker.bin network create "${@}"
fi
else
#just call docker as normal if not network create
/usr/local/bin/docker.bin "${@}"
fi
This has the added benefit of propagating whatever MTU is set on the default docker bridge, which works in both dind and docker in runner situations.
I've pushed a docker image with the change to tiagomelo/github-actions-runner which basically places the script shim the docker container and propagate the docker MTU to newly created networks.
However, since using containers in Github Actions is pretty common and - apparently - having Kubernetes overlay networks with MTUs smaller than 1500, I think this should be integrated in the official summerwind/actions-runner
image. I'm willing to open a PR with the change if you wish.
Meanwhile, this can be tested by creating a runner with the custom runner image:
apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
name: test-runner
spec:
dockerEnabled: true
dockerMTU: 1420
image: tiagomelo/docker-actions-runner
repository: <github-repository>
You can validate this is working with:
kubectl exec -it -c runner test-runner -- /bin/bash
within the runner:
docker network inspect bridge --format '{{index .Options "com.docker.network.driver.mtu"}}' 2>/dev/null
# returns 1420 (or whatever MTU is set)
docker run --rm rancher/curl https://github.com >/dev/null
# should complete
docker network create test
docker network inspect bridge --format '{{index .Options "com.docker.network.driver.mtu"}}' 2>/dev/null
# returns 1420 (or whatever MTU is set)
docker run --rm --network test rancher/curl https://github.com >/dev/null
# should complete
docker network rm test
Edit: Published a repository with the image under tiagoblackcode/github-actions-runner.
Can this be reopened since the issue still persist? We're running into the same issue where we use dind and buildah to create container images.
@tiagoblackcode Were you able to create a PR for this?
@alexdepalex We aren't very eager to fix this on ARC side because we consider this as an issue in actions/runner
, but happy to review any PR related to this and see if it's feasible for us to maintain it.
@tiagoblackcode Thanks for publishing a docker image containing the fix!
I agree with @mumoshu, but at the same time I think it's a good idea to include the fix directly in summerwind/actions-runner
, since the maintainers of actions/runner
doesn't acknowledge the issue neither in https://github.com/actions/runner/issues/775 nor the open PR https://github.com/actions/runner/pull/1650
I agree with the general sentiment that the issue is not in ARC but instead in actions/runner
or even in docker, honestly. Since docker supports the --mtu
flag, it should use its MTU value as default to any networks created unless specified otherwise.
However, since it seems we're kind of stuck on where to implement this, I suggest the following:
- Document in the Troubleshooting section the issue, and the workaround.
- Optionally, create a secondary image in ARC with the fix, and a USE AT YOUR OWN RISK advert. Alternatively, point to my image, but I'd rather have this in the official repository since it has more visibility. I suggest a secondary image because this can have unintended consequences we're not aware of at the moment.
- Bring visibility to the actions/runner#775 issue.
This way ARC users have a documented way to fix the issue while we wait for actions/runner
to support this.
@mumoshu what are your thoughts on this?
@NicklasWallgren regarding https://github.com/actions/runner/pull/1650, instead of having an env variable I think it's better to inspect the bridge network, figure out its MTU and use it to create the network.
Edit:
There's an open PR in https://github.com/moby/moby/pull/43197 to propagate the MTU set in --mtu
to networks created without an explicit MTU.
@tiagoblackcode Yeah, you are probably right. I did suggest that as an alternative solution in the PR.
I suggest a secondary image because this can have unintended consequences we're not aware of at the moment.
This might be the most viable option 👍
Also, I'd greatly appreciate it if you folks mind helping me provide user support around this workaround in the future.
Maybe- include some of you into CODEOWNERS
file so that it's more clear that there are multiple folks not only me but also you who is known to have some knowledge about the alternative runner dockerfile (with the said workaround)?
I'm willing to help you with support.
I'm gonna prepare a PR with the secondary image and regarding it in the next couple of days.
Hello,
I finally had some time to work on this. Please check out #1201 which adds a new docker image to the repository that propagates the MTU setting to networks created in Github workflows.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
With the introduction of --default-network-opt mapmap
as an argument for the docker daemon, is it possible to add an option to specify custom arguments for the docker sidecar image ?
If we could specify dockerArgs: [ '--default-network-opt=bridge=com.docker.network.driver.mtu=1400' ]
to RunnerSet.spec
or RunnerDeployment.spec.template.spec
then that might solve this issue generally without wrappers ?
The wrapper doesn't help for applications that use the docker API directly (like the dependabot-action network create)
I'd be happy to do a PR if that seems like a valid option.