github-actions-runner-operator icon indicating copy to clipboard operation
github-actions-runner-operator copied to clipboard

Finalization of pods not run when CR is deleted

Open aroemen opened this issue 4 years ago • 17 comments

Running kubectl apply -f .\gh-runners-linux.yaml creates the runners as expected in my GitHub organization. When I delete them though (using kubectl delete -f .\gh-runners-linux.yaml), the pods that contained the runners get stuck in a "Terminating" status.

NAMESPACE                        NAME                                              READY   STATUS        RESTARTS   AGE
github-action-runners            runner-pool-pod-fhthp                             0/3     Terminating   0          4m50s
github-action-runners            runner-pool-pod-wfs62                             0/3     Terminating   0          4m50s
github-actions-runner-operator   github-actions-runner-operator-59b9d486b5-t2p62   1/1     Running       0          5m26s

If I edit the pod and remove the finalizer (garo.tietoevry.com/runner-registration), the pod successfully deletes after saving that change. The runner is not being removed from my list of GitHub self hosted runners though as I would expect. Am I missing something here?

aroemen avatar Mar 11 '21 18:03 aroemen

Then there is a problem with unregistration, please provide logs from the operator to enable me to help you.

davidkarlsen avatar Mar 11 '21 18:03 davidkarlsen

@davidkarlsen I don't see any mention of the delete in the operator log. The delete command was issued at 12:34:23 which is the last time there is anything in the operator logs here:

2021-03-11T18:30:34.050Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	setup	starting manager
I0311 18:30:34.052860       1 leaderelection.go:243] attempting to acquire leader lease github-actions-runner-operator/4ef9cd91.tietoevry.com...
2021-03-11T18:30:34.052Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
I0311 18:30:51.471375       1 leaderelection.go:253] successfully acquired lease github-actions-runner-operator/4ef9cd91.tietoevry.com
2021-03-11T18:30:51.471Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"ConfigMap","namespace":"github-actions-runner-operator","name":"4ef9cd91.tietoevry.com","uid":"830a98c7-1d79-4fd4-8b16-27048338c333","apiVersion":"v1","resourceVersion":"156761"}, "reason": "LeaderElection", "message": "github-actions-runner-operator-59b9d486b5-hbsrz_a1bc3d27-328e-490c-86e3-4e6033887fbf became leader"}
2021-03-11T18:30:51.472Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.573Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.674Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.775Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting Controller	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}
2021-03-11T18:30:51.775Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting workers	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "worker count": 1}
2021-03-11T18:30:51.775Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:30:52.172Z	INFO	controllers.GithubActionRunner	Scaling up	{"githubactionrunner": "github-action-runners/runner-pool", "numInstances": 2}
2021-03-11T18:30:52.182Z	INFO	controllers.GithubActionRunner	Creating a new Pod	{"githubactionrunner": "github-action-runners/runner-pool", "Pod.Namespace": "github-action-runners", "Pod.Name": "runner-pool-pod-4ts8j", "result": "created"}
2021-03-11T18:30:52.182Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GithubActionRunner","namespace":"github-action-runners","name":"runner-pool","uid":"377cc688-b76c-4862-b268-3e306e2dc484","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"156732"}, "reason": "Scaling", "message": "Created pod github-action-runners/runner-pool-pod-4ts8j"}
2021-03-11T18:30:52.186Z	INFO	controllers.GithubActionRunner	Creating a new Pod	{"githubactionrunner": "github-action-runners/runner-pool", "Pod.Namespace": "github-action-runners", "Pod.Name": "runner-pool-pod-779pp", "result": "created"}
2021-03-11T18:30:52.186Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GithubActionRunner","namespace":"github-action-runners","name":"runner-pool","uid":"377cc688-b76c-4862-b268-3e306e2dc484","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"156732"}, "reason": "Scaling", "message": "Created pod github-action-runners/runner-pool-pod-779pp"}
2021-03-11T18:30:52.256Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:30:52.401Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:31:52.256Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:32:52.502Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:33:52.687Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:22.734Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:23.141Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:52.876Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}

aroemen avatar Mar 11 '21 18:03 aroemen

Sorry, I just noticed I put this on the wrong project. This should probably be on the github-actions-runner-operator project than here. Let me know if you want me to move it.

aroemen avatar Mar 11 '21 18:03 aroemen

that's strange, what version are you running of the operator? can you provide the CR for the runner pool?

davidkarlsen avatar Mar 11 '21 21:03 davidkarlsen

I'm running the latest version from helm charts 2.5.10. I'm just testing locally in my k8s environment in docker on win10.

apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
  name: runner-pool
  namespace: github-action-runners
spec:
  minRunners: 2                # minimum running pods, required
  maxRunners: 6                # max number of pods, required
  reconciliationPeriod: 1m     # How often it will reconcile, optional, default 1m
  organization: MYORG  # the github org, required
  # repository: "theRepoName"  # if runner for repo, optional
  tokenRef:
    key: GH_TOKEN
    name: actions-runner
  podTemplateSpec:
    metadata:
      annotations:
        "prometheus.io/scrape": "true"
        "prometheus.io/port": "3903"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: garo.tietoevry.com/pool
                      operator: In
                      values:
                        - runner-pool
      containers:
        - name: runner
          env:
            - name: RUNNER_DEBUG
              value: "true"
            - name: DOCKER_TLS_CERTDIR
              value: /certs
            - name: DOCKER_HOST
              value: tcp://localhost:2376
            - name: DOCKER_TLS_VERIFY
              value: "1"
            - name: DOCKER_CERT_PATH
              value: /certs/client
            - name: ACTIONS_RUNNER_INPUT_LABELS
              value: linux,x64
            - name: ACTIONS_RUNNER_INPUT_RUNNERGROUP
              value: "Internal"
            - name: GH_ORG
              value: MYORG
            # if runner for repo:
            # - name: GH_REPO
            #   value: theRepoName
          envFrom:
            - secretRef:
                name: runner-pool-regtoken
          # find the fixed-in-time tags at https://quay.io/repository/evryfs/github-actions-runner?tab=tags if you want to avoid pulling on a moving tag
          # due to https://github.com/actions/runner/issues/246 the runner sw needs to be recent
          # you can subscribe to release-feeds at https://github.com/evryfs/github-actions-runner/releases.atom
          image: quay.io/evryfs/github-actions-runner:latest
          imagePullPolicy: Always
          resources: {}
          volumeMounts:
            - mountPath: /certs
              name: docker-certs
            - mountPath: /home/runner/_diag
              name: runner-diag
            - mountPath: /home/runner/_work
              name: runner-work
            # - mountPath: /home/runner/.m2
            #   name: mvn-repo
            # - mountPath: /home/runner/.m2/settings.xml
            #   name: settings-xml
        - name: docker
          env:
            - name: DOCKER_TLS_CERTDIR
              value: /certs
          image: docker:stable-dind
          imagePullPolicy: Always
          args:
            # See linked issues from: https://github.com/evryfs/github-actions-runner-operator/issues/39
            - --mtu=1430
          resources: {}
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /var/lib/docker
              name: docker-storage
            - mountPath: /certs
              name: docker-certs
            - mountPath: /home/runner/_work
              name: runner-work
        - name: exporter
          image: quay.io/evryfs/github-actions-runner-metrics:v0.0.3
          ports:
            - containerPort: 3903
              protocol: TCP
          volumeMounts:
            - name: runner-diag
              mountPath: /_diag
              readOnly: true
      volumes:
        - emptyDir: {}
          name: runner-work
        - emptyDir: {}
          name: runner-diag
        - emptyDir: {}
          name: mvn-repo
        - emptyDir: {}
          name: docker-storage
        - emptyDir: {}
          name: docker-certs
        # - configMap:
        #     defaultMode: 420
        #     name: settings-xml
        #   name: settings-xml

aroemen avatar Mar 11 '21 21:03 aroemen

I was able to reproduce it. It's an edge case when you delete the actual cr. In this case it's gone and the cleanup step handling the finalization https://github.com/evryfs/github-actions-runner-operator/blob/master/controllers/githubactionrunner_controller.go#L116 is not reached.

GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator

davidkarlsen avatar Mar 11 '21 22:03 davidkarlsen

What would be another way to tear down these resources then?

aroemen avatar Mar 11 '21 22:03 aroemen

Hi there, I have the same issue here

NAME                    READY   STATUS        RESTARTS   AGE
runner-pool-pod-7qhqc   0/3     Terminating   0          4d6h
runner-pool-pod-d96bw   0/3     Terminating   0          4h38m
runner-pool-pod-w278v   0/3     Terminating   0          4h38m
runner-pool-pod-xbmww   0/3     Terminating   0          4h47m

I can't remove them. Thank you.

duyhenryer avatar Jun 11 '21 09:06 duyhenryer

@aroemen @duyhenryer I was able to delete them by removing the finalizers field. Patch the finalizers list to be null:

kubectl patch pod <POD_NAME> -n <NAMESPACE> -p '{"metadata":{"finalizers":null}}'

gabriellemadden avatar Jun 11 '21 13:06 gabriellemadden

yes, and that's what the operator does after de-registering them from github - which is why I am curious what the operator logs.

davidkarlsen avatar Jun 11 '21 13:06 davidkarlsen

@davidkarlsen I posted the operator logs back in March. Do you need additional data?

aroemen avatar Jun 14 '21 02:06 aroemen

@aroemen sorry, commented on the wrong issue, I was thinking of https://github.com/evryfs/github-actions-runner-operator/issues/232 which was fixed recently. Still need this to fix this one (deleting CR)

davidkarlsen avatar Jun 14 '21 10:06 davidkarlsen

@aroemen #264 will solve this, as you can scale the pool to zero, then delete the CR.

davidkarlsen avatar Jun 14 '21 10:06 davidkarlsen

Maybe the CR should have finalizer as well.

zhsj avatar Jul 12 '21 17:07 zhsj

I'm trying to make this work on latest build but cant seem to make it...
$ kubectl patch githubactionrunners.garo.tietoevry.com runner-pool --namespace actions-runner --patch '{"spec":{"minRunners":0}}' --type=merge Results in The GithubActionRunner "runner-pool" is invalid: spec.minRunners: Invalid value: 0: spec.minRunners in body should be greater than or equal to 1

I suspect that either the image i'm pulling is not the latest - or i'm pulling the image wrong, the operator image i'm pulling using the published helm charts :
helm upgrade --install --wait github-actions-runner-operator evryfs-oss/github-actions-runner-operator --namespace actions-runner-operator --set githubapp.existingSecret=github-runner-app --set githubapp.enabled=true

The runner image is this one : quay.io/evryfs/github-actions-runner:latest

What am i missing ?

Thx Tony

tonywildey-valstro avatar Dec 16 '21 19:12 tonywildey-valstro

@tonywildey-valstro you probably don't have the lastest crd: https://raw.githubusercontent.com/evryfs/github-actions-runner-operator/v0.10.0/config/crd/bases/garo.tietoevry.com_githubactionrunners.yaml

davidkarlsen avatar Dec 30 '21 22:12 davidkarlsen

@tonywildey-valstro you probably don't have the lastest crd: https://raw.githubusercontent.com/evryfs/github-actions-runner-operator/v0.10.0/config/crd/bases/garo.tietoevry.com_githubactionrunners.yaml

Ah - there we go - I installed using the helm chart: https://github.com/evryfs/helm-charts/blob/master/charts/github-actions-runner-operator/crds/garo.tietoevry.com_githubactionrunners.yaml which does not have the min runners change

Thx Tony

GitHub
OpenSourced Helm charts. Contribute to evryfs/helm-charts development by creating an account on GitHub.

tonywildey-valstro avatar Jan 03 '22 17:01 tonywildey-valstro