dapr
dapr copied to clipboard
Scheduler failed to retrieve iniial identity certificate
K8s 1.31 Dapr 1.14.4
Probably need some pointers to help investigate.
Actual Behavior
Scheduler and Operator fail to start even though Sentry is running NAME READY STATUS RESTARTS AGE dapr-operator-6c677db88d-hk4t7 0/1 CrashLoopBackOff 23 (2m15s ago) 53m dapr-placement-server-0 0/1 CrashLoopBackOff 21 (3m54s ago) 53m dapr-scheduler-server-0 0/1 CrashLoopBackOff 21 (4m24s ago) 53m dapr-sentry-968cd4bf-gkngv 1/1 Running 0 53m dapr-sidecar-injector-bc865f6fd-p2gjh 0/1 CrashLoopBackOff 23 (114s ago) 53m
Sentry Log time="2024-09-26T13:42:57.54902543Z" level=info msg="Adding validator 'kubernetes' with Sentry ID: spiffe://cluster.local/ns/dapr-system/dapr-sentry" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry type=log ver=1.14.4 time="2024-09-26T13:42:57.549047798Z" level=info msg="Using kubernetes secret store for trust bundle storage" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry.ca type=log ver=1.14.4 time="2024-09-26T13:42:57.558740202Z" level=info msg="Root and issuer certs found: using existing certs" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry.ca type=log ver=1.14.4 time="2024-09-26T13:42:57.558781964Z" level=info msg="CA certificate key pair ready" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry type=log ver=1.14.4 time="2024-09-26T13:42:57.558812524Z" level=info msg="Using validator 'kubernetes'" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry type=log ver=1.14.4 time="2024-09-26T13:42:57.558970057Z" level=info msg="Healthz server is listening on [::]:8080" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry type=log ver=1.14.4 time="2024-09-26T13:42:57.559002875Z" level=info msg="Fetching initial identity certificate" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.runtime.security type=log ver=1.14.4 time="2024-09-26T13:42:57.559000178Z" level=info msg="metrics server started on :9090/" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry type=log ver=1.14.4 2024/09/26 13:42:57 Failed to export to Prometheus: cannot register the collector: duplicate metrics collector registration attempted time="2024-09-26T13:42:57.559405363Z" level=info msg="Security is initialized successfully" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.runtime.security type=log ver=1.14.4 time="2024-09-26T13:42:57.559422677Z" level=info msg="Starting workload cert expiry watcher; current cert expires on: 2024-09-27 13:42:57 +0000 UTC, renewing at 2024-09-27 01:35:27 +0000 UTC" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.runtime.security type=log ver=1.14.4 time="2024-09-26T13:42:57.559492581Z" level=info msg="Running gRPC server on port 50001" instance=dapr-sentry-968cd4bf-gkngv scope=dapr.sentry.server type=log ver=1.14.4
Scheduler Log time="2024-09-26T14:37:27.595960435Z" level=warning msg="--etcd-space-quota of 1Gi may be too low for production use. Consider increasing the value to 16Gi or larger." instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4 time="2024-09-26T14:37:27.596121517Z" level=info msg="Starting Dapr Scheduler Service -- version 1.14.4 -- commit 583960dc90120616124b60ad2b7820fc0b3edf44" instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4 time="2024-09-26T14:37:27.596136028Z" level=info msg="Log level set to: info" instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4 time="2024-09-26T14:37:27.59655522Z" level=info msg="Fetching initial identity certificate" instance=dapr-scheduler-server-0 scope=dapr.runtime.security type=log ver=1.14.4 time="2024-09-26T14:37:27.596697394Z" level=info msg="Trust anchors file '/var/run/secrets/dapr.io/tls/ca.crt' found" instance=dapr-scheduler-server-0 scope=dapr.runtime.security type=log ver=1.14.4 time="2024-09-26T14:37:27.596841833Z" level=info msg="metrics server started on :9090/" instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4 time="2024-09-26T14:37:27.596867483Z" level=info msg="Healthz server is listening on [::]:8080" instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4 time="2024-09-26T14:37:27.597215976Z" level=info msg="Watching trust anchors file '/var/run/secrets/dapr.io/tls/ca.crt' for changes" instance=dapr-scheduler-server-0 scope=dapr.runtime.security type=log ver=1.14.4 time="2024-09-26T14:37:51.390518821Z" level=info msg="Received signal 'terminated'; beginning shutdown" instance=dapr-scheduler-server-0 scope=dapr.signals type=log ver=1.14.4 time="2024-09-26T14:37:51.390631982Z" level=info msg="Healthz server is shutting down" instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4 time="2024-09-26T14:37:51.393024679Z" level=fatal msg="error running scheduler: failed to retrieve the initial identity certificate: error establishing connection to sentry: context canceled: connection error: desc = "transport: Error while dialing: dial tcp: lookup dapr-sentry.default.svc.cluster.local: i/o timeout"" instance=dapr-scheduler-server-0 scope=dapr.scheduler type=log ver=1.14.4
I would be say it worked as expected, please make sure istiod is ready before creating some gateway/sidecar.
Related: https://github.com/istio/istio/issues/35789
I resolved it with Istio 1.24.2 installed in ambient mesh (sidecar-less) mode. The control plane's kube-apiserver needs to communicate with port 15017, which is forwarded through port 443 of istiod. This is necessary for istiod to automatically replace the placeholder tag auto with a tag that matches the istiod version using the MutatingWebhook. My comment on #35789 might be helpful.
🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2024-10-11. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.
Created by the issue and PR lifecycle manager.