testkube
testkube copied to clipboard
`webhook-cert-patch` fails with `connect: connection refused` - `error getting secret`
Describe the bug
When installing Testkube Helm chart version 1.16.17 in a cluster that previously had the chart v1.14.0 installed, we see an error from the webhook-cert-patch
job.
W1206 19:53:22.658205 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
1
{"err":"Get \"https://192.168.248.1:443/api/v1/namespaces/testkube-system/secrets/webhook-server-cert\": dial tcp 192.168.248.1:443: connect: connection refused","level":"fatal","msg":"error getting secret","source":"k8s/k8s.go:351","time":"2023-12-06T19:53:22Z"}
To Reproduce Steps to reproduce the behavior:
- Use
kustomize build --enable-helm
to template out the testkube v1.16.17 helm chart - Then apply the results to the cluster that already has manifests applied from the v1.14.0 helm chart (also templated via
kustomize
) - Wait for the
webhook-cert-patch
pod to be created and then error.
Expected behavior
The webhook-cert-patch
should complete successfully.
Version / Cluster
- Which testkube version? 1.16
- What Kubernetes cluster? AKS
- What Kubernetes version? 1.26
Additional context
We are setting the jobServiceAccountName
helm value, but I do not believe this should impact this non-test-related job.
I did not see this issue when applying the manifests to a brand-new cluster (on my local system), but I'm assuming that this is because the webhook does not need to be patched in this use case.
I am guessing that this issue is because the service account for that job does not have all the rights it needs, but I am not 100% sure.
cc/ @manidharanupoju24
To add the above context, this failed job webhook-cert-patch
is causing the testtriggers to fail.
hey @spkane v1.1.* is really old one, we might even use cert manager at that time. Sounds like something is missed, like rbac permissions, because you can get secret error getting secret
@vsukhin Sorry, that was a typo (corrected). The initial version is 1.14.0.
From what I can tell this only happens during and upgrade and not a clean install.
@ypoplavs @dejanzele any ideas?
Hello @spkane @manidharanupoju24,
We use kube-webhook-certgen
for generating a self-signed certificate and patching the CRDs and WebhookConfiguration objects.
It has two steps: generate & patch.
For the patch step, it would require a service account which has the following RBAC - https://github.com/kubeshop/helm-charts/blob/develop/charts/testkube-operator/templates/role.yaml#L404-L447
Can you check does your service account support all of the permissions?
Kind regards
@dejanzele Yes. I'll try to check the permissions in the next day or so.
Does this job use the jobServiceAccountName
Service account that can be set via helm (which we are using for our test jobs)? When I looked at the webhook-cert-patch
job last week, I thought that it used another service account that is defined inside the Helm chart, and should therefore have all the permissions that it required.
@spkane can you please paste the logs from the webhook jobs (both create & patch jobs)?