linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

linkerd-proxy panics when retrying wire-grpc requests

Open Hexcles opened this issue 2 years ago • 20 comments

What is the issue?

We saw elevated client errors after enabling retries for some GRPC routes in our service profile. Linkerd metrics show inbound requests are a lot higher than outbound requests for this route. After looking around, we found panics in the logs of linkerd-proxy on the client side.

How can it be reproduced?

(We are trying to produce a minimal, open-source case. FWIW, we use https://square.github.io/wire/wire_grpc/ instead of the standard GRPC.)

Logs, error output, etc

thread 'main' panicked at 'if our `state` was `None`, the shared state must be `Some`', /__w/linkerd2-proxy/linkerd2-proxy/linkerd/http-retry/src/replay.rs:152:22

output of linkerd check -o short

linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2023-10-25T10:38:18Z
    see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
    certificate will expire on 2023-10-25T10:38:27Z
    see https://linkerd.io/2.14/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
    certificate will expire on 2023-10-25T10:38:38Z
    see https://linkerd.io/2.14/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
‼ policy-validator cert is valid for at least 60 days
    certificate will expire on 2023-10-25T18:50:02Z
    see https://linkerd.io/2.14/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints

linkerd-viz
-----------
‼ tap API server cert is valid for at least 60 days
    certificate will expire on 2023-10-26T00:25:04Z
    see https://linkerd.io/2.14/checks/#l5d-tap-cert-not-expiring-soon for hints

Status check results are √

Environment

  • Kubernetes 1.24
  • EKS
  • Linux
  • Linkerd 2.14.1

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

Hexcles avatar Oct 24 '23 04:10 Hexcles

https://github.com/linkerd/linkerd2-proxy/blob/986d45895c0945152828e1286f7b5714520a86ba/linkerd/http-retry/src/replay.rs#L152

Hexcles avatar Oct 24 '23 04:10 Hexcles

Some debug logging:

{"timestamp":"[  1269.373381s]","level":"DEBUG","fields":{"message":"client connection open"},"target":"linkerd_transport_metrics::client","spans":[{"name":"inbound"},{"port":80,"name":"server"},{"name":"backend-web.default.svc.cluster.local:80","name":"http"},{"name":"profile"},{"name":"http1"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.375560s]","level":"DEBUG","fields":{"state":"Some(State { classify: Grpc(Codes({2, 4, 7, 13, 14, 15})), tx: Sender { chan: Tx { inner: Chan { tx: Tx { block_tail: 0x7f1dd886c700, tail_position: 0 }, semaphore: Semaphore { semaphore: Semaphore { permits: 10000 }, bound: 10000 }, rx_waker: AtomicWaker, tx_count: 2, rx_fields: \"...\" } } } })"},"target":"linkerd_proxy_http::classify::channel","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"},{"name":"sessions-web","ns":"default","port":"80","name":"service"},{"addr":"172.17.80.145:8080","name":"endpoint"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.375597s]","level":"DEBUG","fields":{"method":"POST","uri":"http://sessions-web/com.session.Sessions/WhoisByCookie","version":"HTTP/2.0"},"target":"linkerd_proxy_http::client","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"},{"name":"sessions-web","ns":"default","port":"80","name":"service"},{"addr":"172.17.80.145:8080","name":"endpoint"},{"name":"h2"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.375605s]","level":"DEBUG","fields":{"headers":"{\"te\": \"trailers\", \"grpc-trace-bin\": \"\", \"grpc-accept-encoding\": \"gzip\", \"grpc-encoding\": \"gzip\", \"x-datadog-trace-id\": \"4009838577945735206\", \"x-datadog-parent-id\": \"6986014011649582376\", \"x-datadog-sampling-priority\": \"-1\", \"x-datadog-tags\": \"_dd.p.dm=-3\", \"traceparent\": \"00-000000000000000037a5cef10d1cf026-60f34ebae8466528-00\", \"tracestate\": \"dd=t.dm:-3\", \"rop\": \"803a8303e5668f0e058c2080c10c222d\", \"ropt\": \"http.handler\", \"pop\": \"803a8303e5668f0e058c2080c10c222d\", \"popt\": \"http.handler\", \"grpc-timeout\": \"29999m\", \"x-if-wsat\": \"<1KB of secrets>\", \"user-agent\": \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36\", \"content-type\": \"application/grpc\", \"accept-encoding\": \"gzip\", \"l5d-dst-canonical\": \"sessions-web.default.svc.cluster.local:80\"}"},"target":"linkerd_proxy_http::client","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"},{"name":"sessions-web","ns":"default","port":"80","name":"service"},{"addr":"172.17.80.145:8080","name":"endpoint"},{"name":"h2"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.385088s]","level":"DEBUG","fields":{"message":"Remote proxy error"},"target":"linkerd_app_outbound::http::handle_proxy_error_headers","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"},{"name":"sessions-web","ns":"default","port":"80","name":"service"},{"addr":"172.17.80.145:8080","name":"endpoint"}],"threadId":"ThreadId(1)"}
thread 'main' panicked at 'if our `state` was `None`, the shared state must be `Some`', /__w/linkerd2-proxy/linkerd2-proxy/linkerd/http-retry/src/replay.rs:152:22
{"timestamp":"[  1269.385179s]","level":"DEBUG","fields":{"message":"dropping ResponseBody"},"target":"linkerd_proxy_http::classify::channel","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.385191s]","level":"DEBUG","fields":{"message":"sending EOS to classify"},"target":"linkerd_proxy_http::classify::channel","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.385631s]","level":"DEBUG","fields":{"message":"The client is shutting down the connection","res":"Ok(())"},"target":"linkerd_proxy_http::server","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"},{"addr":"10.100.169.20:80","name":"proxy"},{"name":"http"}],"threadId":"ThreadId(1)"}
{"timestamp":"[  1269.385671s]","level":"DEBUG","fields":{"message":"Connection closed"},"target":"linkerd_app_core::serve","spans":[{"name":"outbound"},{"client.addr":"172.17.75.208:58594","server.addr":"10.100.169.20:80","name":"accept"}],"threadId":"ThreadId(1)"}

Hexcles avatar Dec 06 '23 06:12 Hexcles

@hawkw do you know how to enable RUST_BACKTRACE in linkerd-proxy?

Hexcles avatar Dec 06 '23 06:12 Hexcles

@Hexcles were you able to create a repro for this?

wmorgan avatar Dec 18 '23 18:12 wmorgan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 19 '24 00:03 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 03 '24 18:07 stale[bot]

Still happening. The panic site has been moved though:

https://github.com/linkerd/linkerd2-proxy/blob/837fbc9531844e5f10d7f4480555127236e6a09b/linkerd/http/retry/src/replay.rs#L152

Working on a repro

Hexcles avatar Jul 10 '24 16:07 Hexcles

OK here's my complete repro:

https://github.com/Hexcles/wire/blob/grpc-sample/samples/wire-grpc-sample/k8s.yaml

  1. Create a new k8s cluster (I used kind)
  2. Install linkerd (I used linkerd CLI)
  3. kubectl apply -f k8s.yaml
  4. Wait for the pods to become ready and observe linkerd logs in the client pod: you'll soon see a panic (within a minute)

Hexcles avatar Jul 11 '24 21:07 Hexcles

I notice that your proto is:

service Whiteboard {
  rpc Whiteboard (stream WhiteboardCommand) returns (stream WhiteboardUpdate) {
  }

  rpc Echo (Point) returns (Point) {
  }
}

Are you exercising both RPCs in this scenario?

olix0r avatar Jul 17 '24 18:07 olix0r

Nope, only the Echo. I didn't test the streaming version actually. I added the unary call for a simpler repro.

So here's the server-side code exercised:

https://github.com/Hexcles/wire/blob/fa9f1e2b7d16fc2364a62b45381d42dd9323a439/samples/wire-grpc-sample/server/src/main/java/com/squareup/wire/whiteboard/WhiteboardGrpcAction.kt#L39-L41

And client-side code:

https://github.com/Hexcles/wire/blob/fa9f1e2b7d16fc2364a62b45381d42dd9323a439/samples/wire-grpc-sample/client-simple/src/main/java/com/squareup/wire/whiteboard/SimpleGrpcClient.kt#L14

Hexcles avatar Jul 17 '24 19:07 Hexcles

Note that both sides use wire-grpc, not upstream grpc-java from Google. They are supposedly compatible on the wire, but apparently there's something unique with the frames produced by wire-grpc (otherwise, you'd have a lot of bug reports from grpc users already).

Hexcles avatar Jul 17 '24 19:07 Hexcles

Thanks. This repro will be enough for us to track this down.

We're currently working on some other retry improvements (that will also address #12826). The good news is that I've tried your repro against the branch of new work. We're going to prioritize making the new functionality available on an edge release; but we'll follow up to ensure this underlying issue is eliminated.

olix0r avatar Jul 17 '24 21:07 olix0r

The good news is that I've tried your repro against the branch of new work.

Do you mean you can reproduce the panic on stable, and the WIP feature in edge no longer exhibits the panic? That's great news!

Hexcles avatar Jul 17 '24 21:07 Hexcles

Ah, yeah. The WIP fixes the issue.

I believe it's caused by inconsistent framing emitted by wire-grpc...

A typical stream loooks like:

[     http:Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(3) }
[     http:Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(3), flags: (0x1: END_STREAM) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(1) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }

Importantly, there is a data frame with an END_STREAM flag.

On the second request, however, no such END_STREAM is set:

[     http:Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
[     http:Connection{peer=Server}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(3) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
[     http: linkerd_proxy_http::classify::channel: dropping ResponseBody
[     http:Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }
[     http:Connection{peer=Server}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(5), flags: (0x4: END_HEADERS) }
[     http:Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(5) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}: linkerd_proxy_http::classify::channel: state=Some(State { classify: Grpc(Codes({2, 4, 7, 13, 14, 15})), tx: Sender { chan: Tx { inner: Chan { tx: Tx { block_tail: 0x7f4d96031e00, tail_position: 0 }, semaphore: Semaphore { semaphore: Semaphore { permits: 10000 }, bound: 10000 }, rx_waker: AtomicWaker, tx_count: 2, rx_fields: "..." } } } })
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint: linkerd_proxy_http::client: method=POST uri=http://server/com.squareup.wire.whiteboard.Whiteboard/Echo version=HTTP/2.0
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(3) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
[     service{ns=default name=server port=80}:pool:endpoint{addr=10.42.0.80:8080}:http.endpoint:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(3) }

When the server responds before the request stream has completed, it appears to put the retry middleware into a bad state... But this is valid at the protocol level and in any case we should never crash here...

We'll update the issue when something is available to test on edge.

olix0r avatar Jul 18 '24 16:07 olix0r

edge-24.7.5 includes support for GRPCRoute resource annotations that enable timeout and retry configurations. We'll be working on more official documentation, but I wanted to share a quick demo of how to use these new configs. I've udpated the wire-grpc example manifets with a route configuration like:

---
kind: GRPCRoute
apiVersion: gateway.networking.k8s.io/v1alpha2
metadata:
  name: whiteboard-echo
  annotations:
    retry.linkerd.io/grpc: internal
    retry.linkerd.io/limit: "2"
    retry.linkerd.io/timeout: 150ms
    timeout.linkerd.io/request: 1s
spec:
  parentRefs:
    - name: whiteboard
      kind: Service
      group: core
  rules:
    - matches:
        - method:
            type: Exact
            service: com.squareup.wire.whiteboard.Whiteboard
            method: Echo
...

The retry.linkerd.io/grpc annotation can be used to configure a list of status codes:

metadata:
  annotations:
    retry.linkerd.io/grpc: cancelled,deadline-exceeded,internal,resource-exhausted,unavailable

While the demo app doesn't actually trigger timeouts or retries, we are able to observe gRPC-status aware route metrics:

# HELP outbound_grpc_route_request_duration_seconds The time between request initialization and response completion.
# TYPE outbound_grpc_route_request_duration_seconds histogram
# UNIT outbound_grpc_route_request_duration_seconds seconds
outbound_grpc_route_request_duration_seconds_sum{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 2.269708098
outbound_grpc_route_request_duration_seconds_count{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 197
outbound_grpc_route_request_duration_seconds_bucket{le="0.05",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 197
outbound_grpc_route_request_duration_seconds_bucket{le="0.5",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 197
outbound_grpc_route_request_duration_seconds_bucket{le="1.0",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 197
outbound_grpc_route_request_duration_seconds_bucket{le="10.0",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 197
outbound_grpc_route_request_duration_seconds_bucket{le="+Inf",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 197
# HELP outbound_grpc_route_request_statuses Completed request-response streams.
# TYPE outbound_grpc_route_request_statuses counter
outbound_grpc_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",grpc_status="OK",error=""} 197
# HELP outbound_grpc_route_backend_requests The total number of requests dispatched.
# TYPE outbound_grpc_route_backend_requests counter
outbound_grpc_route_backend_requests_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
# HELP outbound_grpc_route_backend_response_duration_seconds The time between request completion and response completion.
# TYPE outbound_grpc_route_backend_response_duration_seconds histogram
# UNIT outbound_grpc_route_backend_response_duration_seconds seconds
outbound_grpc_route_backend_response_duration_seconds_sum{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 0.33726197
outbound_grpc_route_backend_response_duration_seconds_count{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="0.025",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="0.05",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="0.1",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="0.25",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="0.5",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="1.0",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="10.0",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
outbound_grpc_route_backend_response_duration_seconds_bucket{le="+Inf",parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name=""} 197
# HELP outbound_grpc_route_backend_response_statuses Completed responses.
# TYPE outbound_grpc_route_backend_response_statuses counter
outbound_grpc_route_backend_response_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo",backend_group="core",backend_kind="Service",backend_namespace="default",backend_name="whiteboard",backend_port="80",backend_section_name="",grpc_status="OK",error=""} 197
# HELP outbound_grpc_route_retry_limit_exceeded Retryable requests not sent due to retry limits.
# TYPE outbound_grpc_route_retry_limit_exceeded counter
outbound_grpc_route_retry_limit_exceeded_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 0
# HELP outbound_grpc_route_retry_overflow Retryable requests not sent due to circuit breakers.
# TYPE outbound_grpc_route_retry_overflow counter
outbound_grpc_route_retry_overflow_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 0
# HELP outbound_grpc_route_retry_requests Retry requests emitted.
# TYPE outbound_grpc_route_retry_requests counter
outbound_grpc_route_retry_requests_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 0
# HELP outbound_grpc_route_retry_successes Successful responses to retry requests.
# TYPE outbound_grpc_route_retry_successes counter
outbound_grpc_route_retry_successes_total{parent_group="core",parent_kind="Service",parent_namespace="default",parent_name="whiteboard",parent_port="80",parent_section_name="",route_group="gateway.networking.k8s.io",route_kind="GRPCRoute",route_namespace="default",route_name="whiteboard-echo"} 0

I'll leave this issue open until we ensure this issue is fixed in in the ServiceProfile router as well.

olix0r avatar Jul 26 '24 19:07 olix0r

IIUC HttpRoute doesn't work with ServiceProfile. Does GrpcRoute not work with ServiceProfile as well?

Hexcles avatar Jul 26 '24 20:07 Hexcles

Correct, this is a mutually exclusive routing interface.

olix0r avatar Jul 26 '24 22:07 olix0r

Apologies for the nudge, but any plan to fix this in ServiceProfile soon-ish? Thanks!

Hexcles avatar Aug 07 '24 23:08 Hexcles

@Hexcles Hey, nothing concrete yet – we're working out how to get this done.

kflynn avatar Aug 08 '24 16:08 kflynn