github-actions-runner-operator
github-actions-runner-operator copied to clipboard
After a couple of hours of not using the runner, it will become offline
So after everything is up an running, did run some jobs, all is good but then, after not using the runner for some hours, it wil become offline, and I will have to delete the runner pool and delete the runner from github so that it gets recreated. Is there some setting like idle timeout or something like that? I do have the minimum runner set to 1 in my CRD
Hello?
What does the logs of the pod say? I am no longer working with tietoevry, and no longer have write-access to the repo - but might be able to shed some light on what is wrong.
Hi @davidkarlsen
we have these in our logs of the pod
# Runner removal
Cannot connect to server, because config files are missing. Skipping removing runner from the server.
Does not exist. Skipping Removing .credentials
Does not exist. Skipping Removing .runner
--------------------------------------------------------------------------------
| ____ _ _ _ _ _ _ _ _ |
| / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ |
| | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| |
| | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ |
| \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ |
| |
| Self-hosted runner registration |
| |
--------------------------------------------------------------------------------
# Authentication
√ Connected to GitHub
# Runner Registration
A runner exists with the same name
A runner exists with the same name runner-pool-pod-s4jd2.
looks like it can't be removed which cause it crashloop since the same pod is already running, do know where to look why it's failing to do so?
Cannot connect to server, because config files are missing.
Then there is already a runner registered with this name in the GH console, force-remove those. It's maybe wise to set the pod-policy to restart: Never - then they won't reappear and end in this state.
BTW: Love traktor ;-)
I think this is the problem that we need to always force-remove it and sometimes recreate the runner-pool, I've added restartPolicy: Never
, lets see if it's help
apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
name: runner-pool
namespace: github-actions-runner-operator
spec:
minRunners: 1
maxRunners: 9
reconciliationPeriod: 1m
podTemplateSpec:
metadata:
annotations:
"prometheus.io/scrape": "true"
"prometheus.io/port": "3903"
spec:
restartPolicy: Never
affinity:
@davidkarlsen just wanted to confirm that we don't see any issues since one week, looks like it helped, thank you!
@davidkarlsen it happened again, even with the restartPolicy: Never
.
In the meantime, until a fix is found I unded up setting a cron that runs every day inside kubernetes to delete the pod. This way a new one is spawned and is automatically registered in github. I know its not a real solution, it just helps us to focus on different things that are more urgent. Here the code for such cronjob:
kind: CronJob
metadata:
name: delete-runner-pod
spec:
schedule: "00 22 * * *" # Schedule to run daily at midnight Berlin time
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl
image: bitnami/kubectl # Using a kubectl container image
command:
- /bin/sh
- -c
- kubectl get pods --no-headers=true | awk '/^runner-pool-pod-/ {print $1}' | xargs -I {} kubectl delete pod {} --grace-period=0 --force
restartPolicy: OnFailure
ttlSecondsAfterFinished: 172800 # delete job after 48 hours