istio-csr icon indicating copy to clipboard operation
istio-csr copied to clipboard

istio-csr pod healthz check fails for long time in v0.11.0 and v0.12.0

Open Jinn22 opened this issue 1 year ago • 2 comments

istio-csr pod healthz check fails for more than 10min, then becomes healthy for v0.11.0 and v0.12.0. There was no health check issue in istio-csr v0.9.0 and v0.10.0.

logs in istio-csr pod:

I1028 17:10:56.020875       1 tls.go:209] "initial serving certificate didn't complete in 10s; will retry" logger="tls-provider"
I1028 17:11:01.117668       1 healthz.go:60] "healthz check failed" logger="controller-runtime.healthz" checker="tls_provider" error="not ready"
I1028 17:11:01.117713       1 healthz.go:60] "healthz check failed" logger="controller-runtime.healthz" checker="grpc_server" error="not ready"
I1028 17:11:01.117742       1 healthz.go:128] "healthz check failed" logger="controller-runtime.healthz" statuses=[{},{}]
...

after ~10mins , istio-csr pod became healthy without any intervention:

I1028 17:22:02.619721       1 tls.go:209] "initial serving certificate didn't complete in 10s; will retry" logger="tls-provider"
I1028 17:22:06.117344       1 healthz.go:60] "healthz check failed" logger="controller-runtime.healthz" checker="tls_provider" error="not ready"
I1028 17:22:06.117370       1 healthz.go:60] "healthz check failed" logger="controller-runtime.healthz" checker="grpc_server" error="not ready"
I1028 17:22:06.117393       1 healthz.go:128] "healthz check failed" logger="controller-runtime.healthz" statuses=[{},{}]
I1028 17:22:10.401008       1 tls.go:409] "serving certificate ready" logger="tls-provider"
2024-10-28T17:22:10.401160Z	info	spiffe	Added 2 certs to trust domain cluster.local in peer cert verifier
I1028 17:22:10.401964       1 tls.go:221] "fetched initial serving certificate" logger="tls-provider"
I1028 17:22:10.401998       1 tls.go:229] "waiting to renew certificate" logger="tls-provider" renewal-time="2024-10-29 09:22:10.133999379 +0000 UTC m=+58299.711391525"
I1028 17:22:10.477926       1 server.go:171] "grpc serving" logger="grpc-server" serving-addr="0.0.0.0:6443" address="[::]:6443"

When istio-csr healthz check passes, in cert-manager the pod logs istio-csr cert request is ready; there are no logs related to istio-csr cert request from 17:10-17:22:

I1028 17:22:10.384195       1 conditions.go:263] Setting lastTransitionTime for CertificateRequest "istio-csr-9zvwz" condition "Ready" to 2024-10-28 17:22:10.384184505 +0000 UTC m=+24633.856346711

istio-csr values file:

app:
    certmanager:
      issuer:
        name: ""
        kind: ""
        group: ""
    runtimeIssuanceConfigMap: runtime-config-map
    tls:
      istiodCertificateEnable: "dynamic"

istio-csr: v0.11.0, v0.12.0 istio version: 1.22.1 cert-manager version: 1.14.5 vault as external issuer istiod-dynamic certificate was created successfully

Jinn22 avatar Oct 28 '24 18:10 Jinn22

Had to set app.certmanager.issuer.enabled=false to fix this.

Jinn22 avatar Nov 04 '24 13:11 Jinn22

my setup is similar, except i have two istio-csr pods and one of them never gets healthy, while another takes circa 10min.

@Jinn22 can you elaborate on what helped you, please?

Va1 avatar Dec 11 '24 15:12 Va1