actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Deploying Runner: failed to call webhook

Open ltatakis-optaxe opened this issue 1 year ago • 4 comments

Checks

  • [X] I've already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I'm not using a custom entrypoint in my runner image

Controller Version

0.27.4

Helm Chart Version

Not used

CertManager Version

v1.12.0

Deployment Method

Other

cert-manager installation

We install cert-manager as a single YAML file (no templating). That installs

apiVersion: v1
kind: Namespace
metadata:
  name: cert-manager
---
# Source: cert-manager/templates/crds.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: certificaterequests.cert-manager.io
  labels:
    app: 'cert-manager'
    app.kubernetes.io/name: 'cert-manager'
    app.kubernetes.io/instance: 'cert-manager'
    # Generated labels
    app.kubernetes.io/version: "v1.12.0"
  ....

All pods are running correctly

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5f68c9c6dd-msjjk              1/1     Running   0          71m
cert-manager-cainjector-57d6fc9f7d-ps9z4   1/1     Running   0          71m
cert-manager-webhook-5b7ffbdc98-k8l9s      1/1     Running   0          71m

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • [X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • [X] My actions-runner-controller version (v0.x.y) does support the feature
  • [X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
  • [X] I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: gh-runner
  namespace: actions-runner-system
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-nodepool
                operator: In
                values:
                - github-runners-562b
      organization: myorg
      labels:
        - gke-runner

To Reproduce

1. Brand new cluster - no historic deployments
2. Includes google ACM
3. Deployed with `kubectl create -f https://github.com/actions/actions-runner-controller/releases/download/v0.27.4/actions-runner-controller.yaml`
4. Running controller 

NAME                                  READY   STATUS    RESTARTS   AGE
controller-manager-58d69dfdf8-sd7lj   2/2     Running   0          50m


5. Apply the `RunnerDeployment` yaml in `runner.yaml`

Describe the bug

When on a newly created cluster I apply the runner yaml I get the below error.

error message: Error from server (InternalError): 
error when creating "runner.yaml":
 Internal error occurred:
 failed calling webhook "[mutate.runnerdeployment.actions.summerwind.dev](http://mutate.runnerdeployment.actions.summerwind.dev/)":
  failed to call webhook: 
  Post 
  "[https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment?timeout=10s](https://webhook-service.actions-runner-system.svc/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment?timeout=10s)": context deadline exceeded

No resources are created for kubectl get RunnerDeployment

We've reviewed https://github.com/actions/actions-runner-controller/pull/1558

  • No mutating or validating webhooks with actions in the name exist
  • Services exist in the actions-runner-system namespace
controller-manager-metrics-service   ClusterIP   192.1xx.xx.xxx   <none>        8443/TCP   27m
webhook-service                      ClusterIP   192.1xx.xx.xxx   <none>        443/TCP    27m

Describe the expected behavior

For the RunnerDeployment to run and be created succesfully.

Whole Controller Logs

https://gist.github.com/ltatakis-optaxe/16d94dc7fb0e97e9c661bc79a6e9393f

Whole Runner Pod Logs

No runner to provide the logs. The issue.

Additional Context

No response

ltatakis-optaxe avatar Jun 05 '23 12:06 ltatakis-optaxe

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Jun 05 '23 12:06 github-actions[bot]

This workaround worked for me:

kubectl delete mutationwebhookconfiguration mutating-webhook-configuration
kubectl delete mutatingwebhookconfiguration actions-runner-controller-mutating-webhook-configuration
kubectl delete validatingwebhookconfiguration validating-webhook-configuration
kubectl delete validatingwebhookconfiguration actions-runner-controller-validating-webhook-configuration

helm uninstall actions-runner-controller -n actions-runner-system

helm upgrade --install actions-runner-controller actions-runner-controller/actions-runner-controller \
    --namespace actions-runner-system \
    --wait \
    --set=authSecret.create=true \
    --set=authSecret.github_token="****" \
    --set=runnerGithubURL="httos://github/com/***"



davidwincent avatar Jun 18 '23 11:06 davidwincent

Same error. The proposed workaround did not work for me.

extravio avatar Nov 01 '23 20:11 extravio

Has anyone solved this issue?

rtsisyk avatar Nov 10 '23 14:11 rtsisyk