linkerd2
                                
                                
                                
                                    linkerd2 copied to clipboard
                            
                            
                            
                        linkerd_reconnect: Failed to connect error=Connection refused (os error 111) after installing 2.11.1
What is the issue?
I am seeing below errors in linkerd destination pod after installing 2.11.1 in aks cluster, below are the logs from the pod. we previously used 2.10 without any issues, we did not upgrade but installed 2.11 after removing 2.10. Please let me know if any logs are required for troubleshooting.
How can it be reproduced?
Installing new 2.11.1 linkerd version
Logs, error output, etc
[     0.001240s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.001570s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.001583s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.001586s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.001587s]  INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[     0.001589s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.001591s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.001593s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[     0.002035s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.003857s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[     0.112761s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.332287s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.738942s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     1.240545s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     1.742524s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     2.067324s]  INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}: linkerd_app_core::serve: Connection closed error=TLS detection timed out
[    72.931085s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    73.431840s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    73.932647s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    74.434329s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    74.936055s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    75.436486s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    75.938267s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    76.440036s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    76.940731s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    77.442481s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    77.944229s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    78.444986s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    78.945760s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    79.446562s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    79.948405s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    80.450125s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    80.950652s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    81.452416s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    81.954388s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    82.456028s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    82.957336s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[    83.459112s]  WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111
output of linkerd check -o short
 ~ linkerd check
Linkerd core checks
===================
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days
linkerd-version
---------------
‼ can determine the latest version
    Get "https://versioncheck.linkerd.io/version.json?version=stable-2.11.1&uuid=58eb0377-e4d1-43a5-8baf-9c9c44545559&source=cli": net/http: TLS handshake timeout
    see https://linkerd.io/2.11/checks/#l5d-version-latest for hints
‼ cli is up-to-date
    unsupported version channel: stable-2.11.1
    see https://linkerd.io/2.11/checks/#l5d-version-cli for hints
control-plane-version
---------------------
√ can retrieve the control plane version
‼ control plane is up-to-date
    unsupported version channel: stable-2.11.1
    see https://linkerd.io/2.11/checks/#l5d-version-control for hints
√ control plane and cli versions match
linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-7d9d7865ff-8kkzh (stable-2.11.1)
	* linkerd-identity-5f8f46575-fdzjb (stable-2.11.1)
	* linkerd-proxy-injector-56fd45796f-8m7cx (stable-2.11.1)
    see https://linkerd.io/2.11/checks/#l5d-cp-proxy-version for hints
√ control plane proxies and cli versions match
Status check results are √
Linkerd extensions checks
=========================
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
‼ linkerd-viz pods are injected
    could not find proxy container for prometheus-86bdfbd9d6-z55qz pod
    see https://linkerd.io/2.11/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
    prometheus-86bdfbd9d6-24t68 status is Failed
    see https://linkerd.io/2.11/checks/#l5d-viz-pods-running for hints
× viz extension proxies are healthy
    The "linkerd-proxy" container in the "prometheus-86bdfbd9d6-24t68" pod is not ready
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-healthy for hints
Environment
- k8s version -- 1.20
 - cluster env -- AKS
 - Host OS -- linux
 - Linkerd Version -- 2.11.1
 
I also see these errors in linkerd proxy pod logs
[ 33.007036s] WARN ThreadId(01) policy:watch{port=8080}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-policy.linkerd.svc.cluster.local. type: SRV class: IN
                                    
                                    
                                    
                                
Hi @prydeep!  Based on those logs, it looks like the destination controller is unable to connect to the policy controller (which runs in the same pod on port 8090).  Do you see any errors in the policy controller's container logs?  Or any warnings in the Kubernetes events (which you can see by doing a kubectl describe on the destination pod).
Thank you for responding @adleong Please find the policy container logs
2022-04-26T18:59:50.571586Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T18:59:51.572722Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:04:10.126764Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:04:11.128036Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:05:00.598921Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:05:01.600125Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:08:29.808007Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:08:30.809124Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:09:20.473420Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:09:21.474626Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:13:25.822166Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: timed out
2022-04-26T19:13:26.824169Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:13:39.602616Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:13:40.603818Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:15:05.162470Z  INFO pods: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: timed out
2022-04-26T19:15:06.163672Z  INFO pods: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:17:45.238031Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:17:46.238604Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:17:58.886482Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:17:59.887775Z  INFO servers: linkerd_policy_controller_k8s_api::watch: Restarting
2022-04-26T19:22:04.549612Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-04-26T19:22:05.550852Z  INFO serverauthorizations: linkerd_policy_controller_k8s_api::watch: Restarting
                                    
                                    
                                    
                                
@Team any help appreciated. We were running without any issues until we upgraded to 2.11.1
We're having the same issue.
It looks like the policy controller is unable to contact the Kubernetes API. Can you try the latest Linkerd stable-2.11.2 to confirm whether the problem is still present?
It looks like the policy controller is unable to contact the Kubernetes API. Can you try the latest Linkerd stable-2.11.2 to confirm whether the problem is still present?
Tried 2.11.2, and I see the following errors in policy container
2022-04-30T01:57:03.013009Z  INFO grpc{port=8090}: linkerd_policy_controller: gRPC server listening addr=0.0.0.0:8090
2022-05-04T13:03:54.605543Z  WARN pods: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-05-04T13:03:54.605773Z  WARN servers: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-05-04T13:03:54.606000Z  WARN serverauthorizations: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-05-04T13:03:54.620562Z ERROR pods: kube_client::client: failed with error error trying to connect: Connection reset by peer (os error 104)
2022-05-04T13:03:54.620730Z ERROR servers: kube_client::client: failed with error error trying to connect: Connection reset by peer (os error 104)
2022-05-04T13:03:54.620778Z ERROR serverauthorizations: kube_client::client: failed with error error trying to connect: Connection reset by peer (os error 104)
2022-05-04T13:03:54.621493Z ERROR pods: kube_client::client: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-05-04T13:04:12.857362Z ERROR serverauthorizations: kube_client::client: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-05-04T13:04:12.863107Z ERROR pods: kube_client::client: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
                                    
                                    
                                    
                                
@prydeep Were you able to resolve this issue?
I turned set proxy.logLevel=warn,linkerd=debug,warn and got this output:
DEBUG ThreadId(01) policy:watch{port=9443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_tls::client: Peer does not support TLS reason=loopback
@jayjaytay I accidentally closed the issue. The issue is still there for me
DEBUG ThreadId(01) policy:watch{port=9443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_tls::client: Peer does not support TLS reason=loopback
This is innocuous. It's indicating that the proxy shouldn't attempt mTLS to a container in the same pod.
Based on my understanding of the logs above:
- the policy controller is unable to reach the Kubernetes API for some reason
 - there are no errors in the proxy related to outbound traffic
 
You might try installing Linkerd with --set policyController.logLevel=info\,linkerd=trace\,kubert=debug -- this will enable verbose logs from the policy controller
Outbound traffic on port 443--where the Kubernetes API is usually hosted--is not proxied on the control plane; so I'd probably ignore the destination controller's proxy logs unless we have some indication that this traffic is being proxied.
We're running linkerd 2.11.2 on AKS and have not encountered this issue, so there are probably some missing relevant details... How were these clusters created? What CNIs are being used? What would we have to do, specifically, to try to reproduce this problem?
@olix0r sorry for late reply, we are using Azure CNI
This could then be Azure/AKS#2750. @prydeep there's a small repro in there you can try to confirm that's the issue.
It also sounds like #8296 may help resolve this in some cases
This could then be Azure/AKS#2750. @prydeep there's a small repro in there you can try to confirm that's the issue.
@alpeb tried that and yes saw the same thing. What is the next step, is there a fix for this and will it be merged to 2.11.2 stable?
That's an AKS bug unfortunately, so nothing we can do on our side, besides voicing your concern in that ticket in the hope it'll get better visibility.
Thank you @alpeb
Hi @alpeb, I'm facing the same issue after upgrading from stable 2.10.2 to 2.11.2 I'm using AKS with kubenet, with k8s version 1.21.7
All control-plane components of linkerd start failing with the same logs provided in the issue description for linkerd-proxy. Do we have any known mitigation over here?
Logs from linkerd-proxy-injectors and linkerd-destination's linkerd-proxy container:
[0.005486s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[0.005494s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[0.008909s]  WARN ThreadId(01) daemon:identity: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[0.012103s]  WARN ThreadId(01) daemon:identity: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[0.028235s]  INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown
                                    
                                    
                                    
                                
Hey @alpeb any updates here, we're stuck on an upgrade
Hi @alpeb, We switched to Azure CNI for our AKS clusters, now only one of our linkerd-destinations pod is failing with:
WARN ThreadId(01) policy:watch{port=8090}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
I tried the repro for the issue Aks#2750, but that is not the case for us.
And it's only failing in one of our clusters, any idea what might be causing this? We're at linkerd 2.11.0 (as 2.11.1 and 2.11.2 were causing above issues^)
I'm using EKS and getting this error too
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
same error in EKS as well. 2.12.2 with linkerd cni enabled
Also seeing this in EKS, it's just popped up sort of out of the blue, and is only affecting one deployment it seems. Currently running version 2.11.4, and am not in a position where I can upgrade Linkerd currently.
Edit: This was actually a misconfiguration in the app the sidecar was proxying. It was set to listen on 127.0.0.1:8080 instead of 0.0.0.0:8080, etc. - meaning the sidecar couldn't connect to the app! All fine now.
I'm going to close this issue out for now. This was originally opened for stable 2.11.1 and we are now on stable 2.12.2. We have been unable to reproduce this and enough comments have happened about slightly different issues that I feel we've deviated from the parent issue.
If you do see this, please feel free to reopen a new issue with as much detail as possible. Thanks!