linkerd2
linkerd2 copied to clipboard
TLS handshake timeout in identity pod
Is there an existing issue for this?
- [X] I have searched the existing issues
What is the issue?
I'm trying to install Linkerd using the Helm chart, but the linkerd-identity pod crashes, and its log shows
time="2021-12-11T16:04:23Z" level=info msg="running version edge-21.12.2"
time="2021-12-11T16:04:33Z" level=fatal msg="Failed to initialize identity service: Post \"https://10.32.0.1:443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews\": net/http: TLS handshake timeout"
The other two pods, linkerd-destination and linkerd-proxy-injector are stuck in ContainerCreating
.
I tried also the stable branch, v2.11.1, with and without the Helm chart, but the result is the same.
How can it be reproduced?
helm install \
--set-file identityTrustAnchorsPEM=ca.crt \
--set-file identity.issuer.tls.crtPEM=issuer.crt \
--set-file identity.issuer.tls.keyPEM=issuer.key \
linkerd/linkerd2
Logs, error output, etc
time="2021-12-11T16:04:23Z" level=info msg="running version edge-21.12.2"
time="2021-12-11T16:04:33Z" level=fatal msg="Failed to initialize identity service: Post \"https://10.32.0.1:443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews\": net/http: TLS handshake timeout"
output of linkerd check -o short
Linkerd core checks
===================
linkerd-existence
-----------------
× control plane pods are ready
No running pods for "linkerd-destination"
see https://linkerd.io/2.11/checks/#l5d-api-control-ready for hints
Status check results are ×
Environment
- 1.23.0 (and 1.22.3)
- Scaleway
- Linux
- 21.12.2 (and 2.11.1)
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
No response
@tensor5 How did you create the CA and issuer certs that you are installing with? It sounds like there may be an issue with their creation which is why the identity controller is failing to start up.
It's also possible that there is an issue with connecting to the k8s API. The log line Failed to initialize identity service: Post \"https://10.32.0.1:443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews\": net/http: TLS handshake timeout
indicating there is a timeout could be the issue here.
Do you have a way to confirm Pods on your cluster can communicate with the k8s API successfully? If you uninstall the Linkerd resources, are there any warnings/errors with linkerd check --pre
?
linkerd check --pre
is all green.
This is the log of the linkerd-proxy in linkerd-identity
...
[ 259.035640s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 259.537692s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 260.004001s] WARN ThreadId(01) policy:watch{port=8080}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-policy.linkerd.svc.cluster.local. type: SRV class: IN
[ 260.039702s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 260.541779s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 261.043786s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 261.545786s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 262.047780s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 262.549805s] WARN ThreadId(02) identity:controller{addr=localhost:8080}:endpoint{addr=127.0.0.1:8080}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
Switching from cilium to calico in Scaleway cluster configuration solves the problem.
@tensor5 This sounds likely to be a cillium configuration issue? It sounds like the controllers were unable to contact the Kubernetes API Server. I'm not sure what we can change in Linkerd to address this.
I'Linkerd was working fine for me before in another cluster using calico. I'm experiencing the exact same issue on a new cluster with cilium. The identity pod is unable to connect to the API, yet all other pods that need the kube-api can make calls and receive responses ok. Linkerd is the only one that can't
I had a look at https://github.com/linkerd/linkerd2/issues/6246 but I don't know what I should try
I've also checkout out https://github.com/linkerd/linkerd2/issues/6238, and there has been progress in the cilium space as the awaited PR is now merged. However, this setting doesn't seem to work. Cilium docs only mention it's designed for Istio, so I'm not sure if it covers all that Linkerd needs
Environment:
- Kubernetes 1.20.6
- Cilium 1.9.6 (hostServices.enabled=false)
- Linkerd (skip port 6443)
I have the same issue. How can I work around it and make it works? do I need to wait for 2.12.0?
cross referencing #7786
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
We recently closed #9817 which is a potential fix for this issue. If you are able to confirm that fix that would be helpful. We'll keep this open for a little bit longer.