kubernetes-secret-generator icon indicating copy to clipboard operation
kubernetes-secret-generator copied to clipboard

Pod startup failure

Open RomanOrlovskiy opened this issue 1 year ago • 4 comments

Describe the bug The pod is not able to start up during the initial deployment using the latest v3.4.0 helm chart. Is it possible this is related to the Kubernetes version?

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
.....
  Normal   Created    3m7s (x2 over 3m19s)   kubelet            Created container kubernetes-secret-generator
  Normal   Started    3m7s (x2 over 3m19s)   kubelet            Started container kubernetes-secret-generator
  Normal   Pulled     3m7s                   kubelet            Successfully pulled image "quay.io/mittwald/kubernetes-secret-generator:latest" in 74.165988ms (74.182482ms including waiting)
  Warning  Unhealthy  2m55s (x8 over 3m13s)  kubelet            Readiness probe failed: Get "http://10.8.11.223:8080/readyz": dial tcp 10.8.11.223:8080: connect: connection refused
  Warning  Unhealthy  2m55s (x6 over 3m13s)  kubelet            Liveness probe failed: Get "http://10.8.11.223:8080/healthz": dial tcp 10.8.11.223:8080: connect: connection refused
  Normal   Killing    2m55s (x2 over 3m7s)   kubelet            Container kubernetes-secret-generator failed liveness probe, will be restarted
  Normal   Pulling    2m54s (x3 over 3m19s)  kubelet            Pulling image "quay.io/mittwald/kubernetes-secret-generator:latest"

Those are the only logs available in pods:

{"level":"info","ts":1681322039.958661,"logger":"cmd","msg":"Operator Version: 0.0.1"}
{"level":"info","ts":1681322039.9587452,"logger":"cmd","msg":"Go Version: go1.15.15"}
{"level":"info","ts":1681322039.9587672,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1681322039.9587784,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"}
{"level":"info","ts":1681322039.9592156,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1681322049.27793,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1681322049.2779632,"logger":"leader","msg":"Continuing as the leader."}

To Reproduce Just a basic installation using helm.

values.yaml:

installCRDs: true
useMetricsService: true

Environment:

  • Kubernetes version: EKS 1.25
  • kubernetes-secret-generator version: v3.1.0, v.3.4.0, latest.
kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:36:43Z", GoVersion:"go1.19", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.6-eks-48e63af", GitCommit:"9f22d4ae876173884749c0701f01340879ab3f95", GitTreeState:"clean", BuildDate:"2023-01-24T19:19:02Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

RomanOrlovskiy avatar Apr 12 '23 18:04 RomanOrlovskiy

We have seen this startup failure too. But only momentarily. It made no sense since "we did not change anything (tm)".

It later turned out that this happened because one of our apiservices became unavailable (in our case linkerd-tap because the pods ran into an issue). You can check with kubectl get apiservices.apiregistration.k8s.io. This did not affect any other workload on the cluster. I honestly do not understand why it causes secret-generator to hang. It definitely should not cause that.

Ideas?

jan-kantert avatar Oct 27 '23 15:10 jan-kantert

We looked some more into this issue. Seems to be a bug in the (old) version of operator-sdk. Guess an update would fix that.

jan-kantert avatar Nov 23 '23 14:11 jan-kantert

Is there a workaround for this issue?

vmartino avatar Feb 07 '24 18:02 vmartino

Workaround: Fix all of your webhooks ;-). This only happens when other webhooks are broken for us.

jan-kantert avatar Feb 19 '24 08:02 jan-kantert