linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

HTTPRoute intermittently fails to distribute traffic

Open Sierra1011 opened this issue 9 months ago • 1 comments

What is the issue?

When using an httproute to dynamically redistribute load from one Service to a MultiCluster mirrored Service, traffic only intermittently transmits correctly.

How can it be reproduced?

  • 2 clusters, east and west, joined by a multicluster link that mirrors appropriately labelled services deployed in west into east
  • a Service foo in cluster east (but no deployment to receive traffic)
  • a mirrored Service in east called foo-west. This should pass traffic to a deployment of something will return basic acks e.g. curls.
  • an HTTPRoute directing traffic received by parentRef Service foo to backendRef foo-east.
  • send traffic to

Logs, error output, etc

Application curl logs:

❯ kubectl exec -it busybox-5cd4968444-zn549 -- wget http://APP.APP.svc.cluster.local/ping -O -
Defaulted container "main" out of: main, linkerd-init (init), linkerd-proxy (init)
Connecting to APP.APP.svc.cluster.local (IPADDR:80)
writing to stdout
written to stdout

☸ non-prod
❯ kubectl exec -it busybox-5cd4968444-zn549 -- wget http://APP.APP.svc.cluster.local/ping -O -
Defaulted container "main" out of: main, linkerd-init (init), linkerd-proxy (init)
Connecting to APP.APP.svc.cluster.local (IPADDR:80)
wget: server returned error: HTTP/1.1 504 Gateway Timeout
command terminated with exit code 1

Proxy sidecar:

[   853.183882s]  INFO ThreadId(01) outbound:proxy{addr=10.100.238.202:80}:service{ns=APP name=APP port=80}: linkerd_proxy_api_resolve::resolve: No endpoints
[   856.184109s]  INFO ThreadId(01) outbound:proxy{addr=10.100.238.202:80}:service{ns=APP name=APP port=80}: linkerd_proxy_balance_queue::worker: Unavailable; entering failfast timeout=3.0
[   856.184575s]  INFO ThreadId(01) outbound:proxy{addr=10.100.238.202:80}:rescue{client.addr=172.27.8.216:48586}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=logical service 10.100.238.202:80: route default.http: backend Service.APP.APP:80: Service.APP.APP:80: service in fail-fast error.sources=[route default.http: backend Service.APP.APP:80: Service.APP.APP:80: service in fail-fast, backend Service.APP.APP:80: Service.APP.APP:80: service in fail-fast, Service.APP.APP:80: service in fail-fast, service in fail-fast]

output of linkerd check -o short

❯ linkerd check -o short
linkerd-version
---------------
‼ cli is up-to-date
    is running version 24.3.2 but the latest edge version is 24.5.3
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 24.5.1 but the latest edge version is 24.5.3
    see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-24.5.1 but cli running edge-24.3.2
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-destination-888c96b5b-7pwmc (edge-24.5.1)
        * linkerd-destination-888c96b5b-hl54h (edge-24.5.1)
        * linkerd-destination-888c96b5b-vn62f (edge-24.5.1)
        * linkerd-identity-56bbfdc7b6-2cfhj (edge-24.5.1)
        * linkerd-identity-56bbfdc7b6-f9bvq (edge-24.5.1)
        * linkerd-identity-56bbfdc7b6-h67sk (edge-24.5.1)
        * linkerd-proxy-injector-68c6b7bc6-5vxm6 (edge-24.5.1)
        * linkerd-proxy-injector-68c6b7bc6-hgmks (edge-24.5.1)
        * linkerd-proxy-injector-68c6b7bc6-l45wh (edge-24.5.1)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-888c96b5b-7pwmc running edge-24.5.1 but cli running edge-24.3.2
    see https://linkerd.io/2/checks/#l5d-cp-proxy-cli-version for hints

linkerd-jaeger
--------------
‼ jaeger extension proxies are up-to-date
    some proxies are not running the current version:
        * collector-7db4655-sdwth (edge-24.5.1)
        * jaeger-5c4c9ff587-5c729 (edge-24.5.1)
        * jaeger-injector-6cb867b4f8-5mhnd (edge-24.5.1)
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cp-version for hints
‼ jaeger extension proxies and cli versions match
    collector-7db4655-sdwth running edge-24.5.1 but cli running edge-24.3.2
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cli-version for hints

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
        * metrics-api-db8857cf8-mfw6c (edge-24.5.1)
        * metrics-api-db8857cf8-p59sg (edge-24.5.1)
        * metrics-api-db8857cf8-wxm87 (edge-24.5.1)
        * tap-6d6cf4c465-2rzj8 (edge-24.5.1)
        * tap-6d6cf4c465-8bshr (edge-24.5.1)
        * tap-6d6cf4c465-bg6sd (edge-24.5.1)
        * tap-injector-66c6f694f4-7rwx4 (edge-24.5.1)
        * tap-injector-66c6f694f4-9hjpw (edge-24.5.1)
        * tap-injector-66c6f694f4-vqw6r (edge-24.5.1)
        * web-56d54f864d-82jcp (edge-24.5.1)
        * web-56d54f864d-j4vbv (edge-24.5.1)
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    metrics-api-db8857cf8-mfw6c running edge-24.5.1 but cli running edge-24.3.2
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

  • Kubernetes v1.29.3
  • EKS cluster
  • Bottlerocket nodes
  • Cilium CNI in AWS VPC replacement mode

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

maybe

Sierra1011 avatar May 16 '24 15:05 Sierra1011