actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

containers create by Github workflow have wrong dockerMTU

Open erichorwath opened this issue 3 years ago • 16 comments

Describe the bug No network possible from Docker containers created by Github workflows (dind - Docker in Docker)

To Reproduce Steps to reproduce the behavior:

  1. create workflow with docker image, e.g.
name: Setup Go

on:
  workflow_dispatch:

jobs:
  setupgo:
    runs-on: [kubernetes]
    container:
      image: 'ubuntu:latest'
    steps:
    - name: Set up Go
      uses: actions/setup-go@v2
      with:
        go-version: 1.17
    
    - name: sleep
      shell: bash
      run: sleep 300
  1. exec into the dind sidecar container and confirm that mtu parameter is propagated properly from runner deployment spec:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: my-runner
spec:
  template:
    spec:
      organization: abc
      ephemeral: true
      dockerMTU: 1440
$ kubectl exec -it my-runner-xyz -c docker -- /bin/sh 
$ ps auxww
    1 root      0:00 docker-init -- dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem --mtu 1440
   53 root      0:03 dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem --mtu 1440
   61 root      0:04 containerd --config /var/run/docker/containerd/containerd.toml --log-level info
   ...
$ ip a
  ...
  4: eth0@if700: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP
      link/ether d2:ee:d5:2f:04:ae brd ff:ff:ff:ff:ff:ff
      inet 100.96.2.200/32 brd 100.96.2.200 scope global eth0
         valid_lft forever preferred_lft forever
      inet6 fe80::d0ee:d5ff:fe2f:4ae/64 scope link
         valid_lft forever preferred_lft forever
  1. run the workflow and exec into the workflow docker container
$ kubectl exec -it my-runner-xyz -c docker -- /bin/sh
$ docker ps
$ docker exec -it <containerID> bash
$ apt update
$ apt install curl -y
$ curl -vOL https://github.com/actions/go-versions/releases/download/1.17.6-1668090892/go-1.17.6-linux-x64.tar.gz
  0     0    0     0    0     0      0      0 --:--:--  0:00:54 --:--:--     0
  -> nothing is happening...
$ apt install iproute2 -y
$ ip a 
  ...
  eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
  -> wrong MTU

Expected behavior I would expect that downloading files should work from docker containers. If I manually start a docker container from the dind sidecar container -> no network problems:

$ kubectl exec -it my-runner-xyz -c docker -- /bin/sh
$ docker run -it --rm ubuntu bash
$ apt update
$ apt install curl -y
$ curl -vOL https://github.com/actions/go-versions/releases/download/1.17.6-1668090892/go-1.17.6-linux-x64.tar.gz
   100  128M  100  128M    0     0  22.9M      0  0:00:05  0:00:05 --:--:-- 23.7M
   --> download completed
$ apt install iproute2 -y
$ ip a 
  ...
  eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440
  -> correct MTU

Environment:

  • Helm Chart Version 0.14
  • Controller Version: 0.20.2
  • Deployment Method: Helm + Kustomize + fluxV2
  • Kubernetes: ip-in-ip Calico on AWS

erichorwath avatar Jan 11 '22 22:01 erichorwath

@erichorwath Hey. Please set dockerMTU appropriately as documented in https://github.com/actions-runner-controller/actions-runner-controller#additional-tweaks. Reading through all the docker-mtu related issues we had experienced https://github.com/actions-runner-controller/actions-runner-controller/issues?q=label%3A%22docker+mtu+issue%22+is%3Aclosed would also help.

mumoshu avatar Jan 11 '22 23:01 mumoshu

@mumoshu thanks for your quick reply! I have set the dockerMTU based on the documentation, which works perfectly fine EXCEPT for workflows where "container" is specified in the jobs section as described above. Unfortunately our infrastructure does not allow us to have 1500 as the host network MTU, so we have to rely on a proper dockerMTU propagation...

erichorwath avatar Jan 11 '22 23:01 erichorwath

@erichorwath Hey. Thanks for your confirmation. For the remaining MTU issue for the service containers, the only rescue I'm aware of is the create a wrapper script for docker network create which is used for setting up docker networks for those service containers. It seems like it's a docker issue that it doesn't inherit MTU setting from the daemon to the custom docker networks...

https://github.com/actions-runner-controller/actions-runner-controller/issues/848#issuecomment-929394653

mumoshu avatar Jan 11 '22 23:01 mumoshu

@mumoshu oh, that explains the problem!

Which part of the coding is able to parse the correct MTU to the docker network create command? Is this part of Github Runner behaviour or the action-runner-controller go code?

erichorwath avatar Jan 11 '22 23:01 erichorwath

@erichorwath The problem is that neither actions runner nor docker sets the proper MTU on docker network create. As a workaround, you'd need to create a custom script named docker and put it in $PATH within runner containers, so that it is called by the actions runner process.

In the custom script, you auto-add MTU if the sub-command is network create, and just proxy all the args as-is to the original docker command for other cases. That's what's the script introduced at https://github.com/actions-runner-controller/actions-runner-controller/issues/848#issuecomment-929394653 is doing.

Our runner image is build from https://github.com/actions-runner-controller/actions-runner-controller/blob/master/runner/Dockerfile. You'd need to add the custom script to the Dockerfile and build your own runner image out of it. A custom runner image can be specified via controller-manager's --runner-image flag or RunnerDeployment.Spec.Template.Image. Probably the latter is easier for you.

mumoshu avatar Jan 12 '22 00:01 mumoshu

Hello,

I've come across this issue as well since the Kubernetes network I'm setting up uses a smaller MTU (1420 instead of the usual 1500).

Despite the MTU being configurable with the dockerMTU setting on the Runner spec, it only affects the default docker network and does not propagate to networks created by the Github runner. This poses a problem when running workflows that use containers such as the actions/checkout@v2 one.

So, I followed-up on the workaround proposed by @FalconerTC here and changed it to support the sidecar dind container since /etc/docker/daemon.json is not available:

#!/usr/bin/env bash

# Inspired by https://github.com/actions-runner-controller/actions-runner-controller/issues/848#issuecomment-929394653
# Inspired by https://github.com/actions/runner/issues/775#issuecomment-927826684

if [[ $1 = "network" ]] && [[ $2 = "create" ]] ; then
    shift; shift #pop 2 first parameters

    MTU=$(docker network inspect bridge --format '{{index .Options "com.docker.network.driver.mtu"}}' 2>/dev/null); 
    if [[ ! -z "$MTU" ]]; then 
        /usr/local/bin/docker.bin network create --opt com.docker.network.driver.mtu=$MTU "${@}"  
    else
        /usr/local/bin/docker.bin network create "${@}"
    fi
else
    #just call docker as normal if not network create
    /usr/local/bin/docker.bin "${@}"
fi

This has the added benefit of propagating whatever MTU is set on the default docker bridge, which works in both dind and docker in runner situations.

I've pushed a docker image with the change to tiagomelo/github-actions-runner which basically places the script shim the docker container and propagate the docker MTU to newly created networks.

However, since using containers in Github Actions is pretty common and - apparently - having Kubernetes overlay networks with MTUs smaller than 1500, I think this should be integrated in the official summerwind/actions-runner image. I'm willing to open a PR with the change if you wish.

Meanwhile, this can be tested by creating a runner with the custom runner image:

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: test-runner
spec:
  dockerEnabled: true
  dockerMTU: 1420
  image: tiagomelo/docker-actions-runner
  repository: <github-repository>

You can validate this is working with:

kubectl exec -it -c runner test-runner -- /bin/bash

within the runner:

docker network inspect bridge --format '{{index .Options "com.docker.network.driver.mtu"}}' 2>/dev/null
# returns 1420 (or whatever MTU is set)

docker run --rm rancher/curl https://github.com >/dev/null
# should complete

docker network create test

docker network inspect bridge --format '{{index .Options "com.docker.network.driver.mtu"}}' 2>/dev/null
# returns 1420 (or whatever MTU is set)

docker run --rm --network test rancher/curl https://github.com >/dev/null
# should complete

docker network rm test

Edit: Published a repository with the image under tiagoblackcode/github-actions-runner.

tiagoblackcode avatar Feb 16 '22 10:02 tiagoblackcode

Can this be reopened since the issue still persist? We're running into the same issue where we use dind and buildah to create container images.

@tiagoblackcode Were you able to create a PR for this?

alexdepalex avatar Feb 17 '22 22:02 alexdepalex

@alexdepalex We aren't very eager to fix this on ARC side because we consider this as an issue in actions/runner, but happy to review any PR related to this and see if it's feasible for us to maintain it.

mumoshu avatar Feb 17 '22 23:02 mumoshu

@tiagoblackcode Thanks for publishing a docker image containing the fix!

I agree with @mumoshu, but at the same time I think it's a good idea to include the fix directly in summerwind/actions-runner, since the maintainers of actions/runner doesn't acknowledge the issue neither in https://github.com/actions/runner/issues/775 nor the open PR https://github.com/actions/runner/pull/1650

NicklasWallgren avatar Feb 18 '22 07:02 NicklasWallgren

I agree with the general sentiment that the issue is not in ARC but instead in actions/runner or even in docker, honestly. Since docker supports the --mtu flag, it should use its MTU value as default to any networks created unless specified otherwise.

However, since it seems we're kind of stuck on where to implement this, I suggest the following:

  • Document in the Troubleshooting section the issue, and the workaround.
  • Optionally, create a secondary image in ARC with the fix, and a USE AT YOUR OWN RISK advert. Alternatively, point to my image, but I'd rather have this in the official repository since it has more visibility. I suggest a secondary image because this can have unintended consequences we're not aware of at the moment.
  • Bring visibility to the actions/runner#775 issue.

This way ARC users have a documented way to fix the issue while we wait for actions/runner to support this.

@mumoshu what are your thoughts on this?

@NicklasWallgren regarding https://github.com/actions/runner/pull/1650, instead of having an env variable I think it's better to inspect the bridge network, figure out its MTU and use it to create the network.

Edit:

There's an open PR in https://github.com/moby/moby/pull/43197 to propagate the MTU set in --mtu to networks created without an explicit MTU.

tiagoblackcode avatar Feb 18 '22 09:02 tiagoblackcode

@tiagoblackcode Yeah, you are probably right. I did suggest that as an alternative solution in the PR.

NicklasWallgren avatar Feb 19 '22 06:02 NicklasWallgren

I suggest a secondary image because this can have unintended consequences we're not aware of at the moment.

This might be the most viable option 👍

Also, I'd greatly appreciate it if you folks mind helping me provide user support around this workaround in the future.

Maybe- include some of you into CODEOWNERS file so that it's more clear that there are multiple folks not only me but also you who is known to have some knowledge about the alternative runner dockerfile (with the said workaround)?

mumoshu avatar Feb 19 '22 06:02 mumoshu

I'm willing to help you with support.

I'm gonna prepare a PR with the secondary image and regarding it in the next couple of days.

tiagoblackcode avatar Feb 21 '22 09:02 tiagoblackcode

Hello,

I finally had some time to work on this. Please check out #1201 which adds a new docker image to the repository that propagates the MTU setting to networks created in Github workflows.

tiagoblackcode avatar Mar 10 '22 14:03 tiagoblackcode

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 09 '22 15:04 stale[bot]

With the introduction of --default-network-opt mapmap as an argument for the docker daemon, is it possible to add an option to specify custom arguments for the docker sidecar image ?

If we could specify dockerArgs: [ '--default-network-opt=bridge=com.docker.network.driver.mtu=1400' ] to RunnerSet.spec or RunnerDeployment.spec.template.spec then that might solve this issue generally without wrappers ?

The wrapper doesn't help for applications that use the docker API directly (like the dependabot-action network create)

I'd be happy to do a PR if that seems like a valid option.

na4ma4 avatar Feb 15 '24 01:02 na4ma4