Docker container in dind containerMode cannot connect to Github
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.3
Deployment Method
ArgoCD
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Deploy the gha-runner-scale-set-controller first with default values
Deploy `gha-runner-scale-set` chart with release name `arc-runner-set`
1.1 At this point, Github Actions work for simple workflow file.
2. Exec into `runner` container in `action-runne-set-****-runner-****` pod
3. Run `sudo apt update && sudo apt install git -y && git clone https://github.com/actions/actions-runner-controller.git` to make sure pod has access to public internet
4. Run `docker run --rm -it alpine sh -c "apk add git && git clone https://github.com/actions/actions-runner-controller.git"`
Describe the bug
Output from step 4:
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz
(1/13) Installing ca-certificates (20240705-r0)
(2/13) Installing brotli-libs (1.1.0-r2)
(3/13) Installing c-ares (1.28.1-r0)
(4/13) Installing libunistring (1.2-r0)
(5/13) Installing libidn2 (2.3.7-r0)
(6/13) Installing nghttp2-libs (1.62.1-r0)
(7/13) Installing libpsl (0.21.5-r1)
(8/13) Installing zstd-libs (1.5.6-r0)
(9/13) Installing libcurl (8.9.0-r0)
(10/13) Installing libexpat (2.6.2-r0)
(11/13) Installing pcre2 (10.43-r0)
(12/13) Installing git (2.45.2-r0)
(13/13) Installing git-init-template (2.45.2-r0)
Executing busybox-1.36.1-r29.trigger
Executing ca-certificates-20240705-r0.trigger
OK: 20 MiB in 27 packages
Cloning into 'actions-runner-controller'...
fatal: unable to access 'https://github.com/actions/actions-runner-controller.git/': SSL connection timeout
Describe the expected behavior
docker run command above run correctly without SSL connection timeout error
Additional Context
Yaml manifest I using to deploy `gha-runner-scale-set-controller` and `gha-runner-scale-set`
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: arc
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: ghcr.io/actions/actions-runner-controller-charts
targetRevision: 0.9.3
chart: gha-runner-scale-set-controller
helm:
releaseName: arc
destination:
name: in-cluster
namespace: arc-systems
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=false
- ServerSideApply=true
- ApplyOutOfSyncOnly=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
revisionHistoryLimit: 3
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: arc-runner-set
namespace: argocd
spec:
project: default
destination:
name: in-cluster
namespace: arc-runners
syncPolicy:
automated:
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- ServerSideApply=true
- ApplyOutOfSyncOnly=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
revisionHistoryLimit: 3
source:
repoURL: ghcr.io/actions/actions-runner-controller-charts
targetRevision: 0.9.3
chart: gha-runner-scale-set
helm:
releaseName: arc-runner-set
parameters:
- name: controllerServiceAccount.namespace
value: arc-systems
- name: controllerServiceAccount.name
value: arc-gha-rs-controller
- name: githubConfigUrl
value: https://github.com/<organization>
- name: minRunners
value: "5"
- name: containerMode.type
value: dind
- name: githubConfigSecret
value: github-app-secret
Controller Logs
https://gist.github.com/duchuyvp/9b626aec67926976f09c52d303becd1a
Runner Pod Logs
This is logs when I push this workflow file:
name: Reproduce
on:
push:
branches: ['*']
jobs:
push-reproduce:
runs-on: arc-runner-set
steps:
- run: sudo apt update && sudo apt install git -y
- run: git clone https://github.com/actions/actions-runner-controller.git
- run: docker run --rm alpine sh -c "apk add git && git clone https://github.com/actions/actions-runner-controller.git"
https://gist.github.com/duchuyvp/6a5db187bfb3657a5361bcf62b0bd4ef
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
@duchuyvp , do you happen to run the deployment on GKE?
@norman-zon I haven't test on GKE, I deployed on-prems
Try setting MTU for the docker daemon like:
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --mtu=1460
The default docker daemon MTU is 1500, but my host network has 1460. So aligning the docker daemon MTU fixed it for me.
@norman-zon Thank you so much, your idea works for me too, I tried to patch one runner pod to add --mtu=1450 to dind container. But I don't know how to add this args when deploy with helm, since dind-container seems to be fixed in gha-runner-scale-set chart
https://github.com/actions/actions-runner-controller/blob/a152741a1a6afa992f8d836a029d551984149c8f/charts/gha-runner-scale-set/templates/_helpers.tpl#L98-L116
Could you please show me how?
I ended up using the solution with a configMap as described in the discussion here.
You have to set
containerMode:
type: none
and then completely specify the template for the container, as described in the values file.
This could be be easier to add to the dind container, if my PR would be merged...
Unfortunately this didn't solve our issue, which is ostensibly the same.
We have self-hosted runners in an on-premises OpenStack K8s cluster. For container actions which specify our own helper image with some useful utilities installed we can not connect to Github to clone the relevant repository. We have tried with both checkout actions, the GitHub cli and standard git with auth setup in the job.
After seeing this post we modified the DinD container as suggested passing the mtu argument and verified that this was indeed being set. And as a test followed the GP's example, trying to clone from the Runner container after installing git, which succeeded, then from the spawned helper container we tried to clone via the already installed git, which failed. All the different tests we have conducted resulted in variations of the same theme - ssl/tls timeout errors:
kubectl exec -it github-runner-scale-set-hello-world-cbr74-runner-jdr2z -- sh
Defaulted container "runner" out of: runner, dind, init-dind-externals (init)
$ sudo apt install git -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
<snipped>
Setting up git (1:2.46.0-0ppa1~ubuntu22.04.1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
$ git clone https://github.com/actions/actions-runner-controller.git <-- we can clone in runner container after installing git
Cloning into 'actions-runner-controller'...
remote: Enumerating objects: 12348, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 12348 (delta 11), reused 8 (delta 1), pack-reused 12321 (from 1)
Receiving objects: 100% (12348/12348), 5.44 MiB | 33.33 MiB/s, done.
Resolving deltas: 100% (8430/8430), done.
$ ls -ltr actions-runner-controller
drwxr-xr-x 23 runner runner 4096 Aug 14 06:42 actions-runner-controller
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cd3c11559488 ghcr.io/***/pipeline-helper:0.0.4 "tail -f /dev/null" About a minute ago Up About a minute e588e3cf54e848bd99acc500aeec932e_ghcrio***pipelinehelper004_3c7f01
$ docker exec -it cd3c11559488 sh
/ # git --version <-- git already installed in container job
git version 2.45.2
/ # git clone https://github.com/actions/actions-runner-controller.git
Cloning into 'actions-runner-controller'...
fatal: unable to access 'https://github.com/actions/actions-runner-controller.git/': SSL connection timeout
Error: Process completed with exit code 128.
The specific error when using the GitHub Cli was error validating token: Get "https://api.github.com/": net/http: TLS handshake timeout
@nikola-jokic HI. i am not sure why in the original Helm there is not way to change the DinD config as its looked in the helm _helpers.tpl
{{ - define "gha-runner-scale-set.dind-container" -}}
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
{{- end }}
In my values file I specified (along with the init and runner container).
template:
spec:
containers:
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --mtu=1400
which works for the default network, but dependabot creates it's own networks with no MTU setting, so it defaults to 1500 and dependabot breaks.
So that would fix the auto-created networks, but it won't help if you create docker networks as part of your actions.
I ended up using the solution discussed here, writing a deamon.json configMap and mounting it inside the container to /etc/docker/daemon.json.
This allow for setting
"bridge": {
"com.docker.network.driver.mtu": "1460"
which is also used for all networks created by actions.
I was going to update today, I saw that moby/moby#43197 has been merged (earlier this year/late last year) and that solves my issue by adding this argument --default-network-opt=bridge=com.docker.network.driver.mtu=1400.
Now when dependabot calls the docker API (not using a shell, so the shims don't help) creating a network for the updater container it now has the MTU set to 1400.
template:
spec:
containers:
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --mtu=1400
- --default-network-opt=bridge=com.docker.network.driver.mtu=1400
From the dind container in the dependabot runner pod.
$ docker network inspect dependabot-job-11050-external-network
Output (cut for size):
[
{
"Name": "dependabot-job-11050-external-network",
"Id": "dff4d1a3f843634c060258f5e808050ac9861ba487a0a0c677278506321374ea",
"Created": "2024-08-20T07:10:54.585512615Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": { ... },
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": { ... }
},
"Options": {
"com.docker.network.driver.mtu": "1400"
},
"Labels": {}
}
]
Maybe these two options (container args and ConfigMap) should be added to the docs, considering how many reactions this issue got?
same issues occures on older version (0.9.0).
curl -v https://github.com fails on (1)HELLO
but curl -v --resolve github.com:443:140.82.121.3 https://github.com/ works.
and with proxy it works as well.
working workaround:
- setup configmap for daemon.json with lower mtu (1400).
- comment out "containerMode.type=dind"
- use a custom template. (https://github.com/actions/actions-runner-controller/discussions/2993#discussioncomment-8071798)
after this patch it works with 0.9.3.
any Idea why only github have this connectivity issue? what bug should be raised?
For those that came here and don't know exactly what to do (as I didn't) here is how I "fixed".
My setup is:
- baremetal machine
- docker setup with containerd
- vanilla k8s installation
- flannel cni - no modifications
My k8s cni are setup to use 1450 MTU, so, I changed docker MTU to 1450 and applied these manifests here:
Docker Daemon JSON:
{
"mtu": 1450,
"dns": [ "<your-ipv4-gateway>", "8.8.8.8", "8.8.4.4"],
"hosts": ["unix:///var/run/docker.sock", "tcp://127.0.0.1:2375"]
}
Helm Command:
helm upgrade --install --namespace actions-runner-system --create-namespace --set=authSecret.create=true -f values.yaml --set=authSecret.github_token="<your-token>" --wait actions-runner-controller actions-runner-controller/actions-runner-controller
For the Actions Runner Controller Helm Chart values file:
runner:
containerMode:
type: "dind"
For the Runner Deployment configuration:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: app
spec:
replicas: 1
template:
spec:
image: summerwind/actions-runner-dind
dockerdWithinRunnerContainer: true
repository: owner/repo
dockerMTU: 1450
env:
- name: ARC_DOCKER_MTU_PROPAGATION
value: "true"
Now, for me, checkout is working perfectly.
Thanks for @na4ma4 answer. this works for me.
I'm using the gha-runner-scale-set with dind mode. one workflow need to pull and run image in the github-runner pod. when worflow starts, the runner creates a bridge-type network interface with MTU of 1500. the kubernetes pod network is configured with an MTU of 1450. The MTU mismatch could cause dropped packets. This results in the following error:
unable to access 'https://github.com/xxx/xxx/': gnutls_handshake() failed: Error in the pull function.
Fix:
using --default-network-opt=bridge=com.docker.network.driver.mtu=1450
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --default-network-opt=bridge=com.docker.network.driver.mtu=1450
I think this is safe to close. Thank you, everyone, for providing answers!