linkerd2
linkerd2 copied to clipboard
Connection refused randomly for pairs of pods
What is the issue?
I am running into a really difficult-to-reproduce issue where our k8s pod will somehow decide it will not serve certain clients, giving logs in the client proxy:
WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
And:
INFO ThreadId(01) outbound:proxy{addr=10.100.32.3:10079}:rescue{client.addr=172.28.187.94:55562}: linkerd_app_core::errors::respond: gRPC request failed error=logical service service-name.namespace.svc.cluster.local:10079: service unavailable error.sources=[service unavailable]
However during this time, the service does successfully connect to other clients and serve their requests descriminately. Restarting the clients has no effect, and restarting the service can 'sometimes' help, resulting in reconnection to some clients but failure to reconnect to others.
The only 'solution' we've seen success with is restarting every single linkerd container and proxy-having service, which is not ideal to say the least.
While I have no solid repro, I'm hoping to at least take away some debugging tips for the next time this happens to us.
How can it be reproduced?
Unfortunately I have not been able to reliably reproduce in our own environments
Logs, error output, etc
Proxy logs from the service:
[ 0.001766s] INFO ThreadId(01) linkerd2_proxy: release 2.210.0 (85db2fc) by linkerd on 2023-09-21T21:24:58Z
[ 0.002498s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.003107s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.003116s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.003118s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.003121s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.003122s] INFO ThreadId(01) linkerd2_proxy: Local identity is default.namespace.serviceaccount.identity.linkerd.cluster.local
[ 0.003124s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.003126s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.019669s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=default.namespace.serviceaccount.identity.linkerd.cluster.local
[ 0.001800s] INFO ThreadId(01) linkerd2_proxy: release 2.210.0 (85db2fc) by linkerd on 2023-09-21T21:24:58Z
[ 0.002498s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.003148s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.003164s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.003166s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.003168s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.003171s] INFO ThreadId(01) linkerd2_proxy: Local identity is default.namespace.serviceaccount.identity.linkerd.cluster.local
[ 0.003173s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.003175s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.012067s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=default.namespace.serviceaccount.identity.linkerd.cluster.local
Logs from the client proxy included above
output of linkerd check -o short
---------------
‼ cli is up-to-date
unsupported version channel: stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
unsupported version channel: stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-6954bdcf79-6p7z5 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-destination-6954bdcf79-df9f2 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-destination-6954bdcf79-jnncs (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-identity-5958cdbd64-gc2qp (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-identity-5958cdbd64-ph8v8 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-identity-5958cdbd64-qsh5m (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-proxy-injector-7664c7cf84-77vl9 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-proxy-injector-7664c7cf84-khhfp (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* linkerd-proxy-injector-7664c7cf84-xzz9x (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
linkerd-destination-6954bdcf79-6p7z5 running 3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda but cli running stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-cli-version for hints
linkerd-viz
-----------
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* grafana-6c4c8b997d-ptswf (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* metrics-api-7d685f8896-f4d52 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* prometheus-dd8b5b7f4-2rsgn (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* tap-59769cd568-7t92z (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* tap-injector-6f987fddf9-f9fs5 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
* web-7c6ff5b7d-7tdb6 (3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda)
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
grafana-6c4c8b997d-ptswf running 3cd7d7a0849f124af2156783ae1989d0a1248d412341cd97f781e60feae98dda but cli running stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints
Status check results are √
Environment
linkerd_controller: stable-2.14.1 linkerd_debug: stable-2.14.1 linkerd_grafana: stable-2.11.1 linkerd_metrics_api: stable-2.14.1 linkerd_policy_controller: stable-2.14.1 linkerd_proxy: stable-2.14.1 linkerd_proxy_init: v2.2.3 linkerd_tap: stable-2.14.1 linkerd_web: stable-2.14.1
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None