envoy icon indicating copy to clipboard operation
envoy copied to clipboard

`response.grpc_status` is set to 2 for plain HTTP requests with 2xx response status

Open mprokopjc opened this issue 1 year ago • 11 comments

Title: response.grpc_status is set to 2 for plain HTTP requests (with 2xx HTTP status code)

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.

response.grpc_status is set to 2 for plain HTTP requests (with 2xx HTTP status code). This causes unexpected results when filtering access logs with expression likes this: expression: "response.code >= 400 || response.grpc_status > 0.

I would expect response.grpc_status to be set to -1 or not set at all when request isn't gRPC request.

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

Use Istio v1.19.3 (that uses Envoy with sha256: c3e7c2a9c657247c857b269f6a86099515d9bf294a1aff70c5d1ef2a562cdde3 - link) and configure its access log to contain response.grpc_status (see Config section below). Then performance HTTP request (any simple GET request is sufficient) that results in 200 OK response.

Config:

Include the config used to configure Envoy.

accessLogFormat: "[%START_TIME%] %CEL(response.grpc_status)% \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% \"%UPSTREAM_TRANSPORT_FAILURE_REASON%\" %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\" %UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME% %ROUTE_NAME%\n"

Logs:

Include the access logs and the Envoy logs.

[2023-12-21T09:07:13.121Z] 2 "GET /get HTTP/1.1" 200 - via_upstream "-" 0 600 1 1 "-" "curl/8.5.0" "371a0dc2-c607-967f-a7b2-010d0a46f551" "httpbin:8000" "10.244.0.23:8080" outbound|8000||httpbin.default.svc.cluster.local 10.244.0.24:47768 10.96.235.15:8000 10.244.0.24:47234 - default

Above you can see that response.grpc_status is unexpectedly 2

mprokopjc avatar Dec 21 '23 12:12 mprokopjc

I'm not sure what CEL(response.grpc_status) is supposed to do here, but I'm guessing it is either initialized to Unknown or there's something happening in some conversion.

@kyessenov do you happen to know what could be the issue here?

adisuissa avatar Jan 03 '24 18:01 adisuissa

I'm observing a similar issue for gRPC-web requests - adding %GRPC_STATUS% to my access logs always seems to report Unknown even though I can see the requests are successful (%RESP(grpc-status)% is always -, and %CEL(response.grpc_status)% leads to a Not supported field in StreamInfo: CEL error). Notably something thinks this is a gRPC request because non-gRPC requests print - instead of Unknown.

dimo414 avatar Jan 31 '24 23:01 dimo414

Same issue as @dimo414 here, i cannot find anything about it in the web

faculbsz-sg avatar Feb 14 '24 14:02 faculbsz-sg

This is computed in Utility::httpToGrpcStatus where HTTP code is translated to gRPC code. The translation treats 200 as "Unknown" for some reason - I think it's because all gRPCs return 200 with a special trailer or header for the actual response code. So if you have an interrupted gRPC call, you cannot infer gRPC return code, hence, "unknown".

kyessenov avatar Feb 22 '24 21:02 kyessenov

Similar issues have been discussed here in the past

https://github.com/envoyproxy/envoy/issues/7954

However, even if I set the replacement string according to the access log manual, they all display the same value. https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage.html

text_format: %GRPC_STATUS(:SNAKE_STRING)% %GRPC_STATUS(:SNAKE_STRING)% %GRPC_STATUS(:NUMBER)%

-> Unknown Unknown Unknown

aki-caffeine avatar Mar 12 '24 05:03 aki-caffeine

We have the same issue where our gRPC Web requests are reported to have %GRPC_STATUS(NUMBER)% = 2. I like the suggestion to have non-gRPC requests report as -1. Can we make that an option? Or are there any workarounds to see if the request was a gRPC or plain HTTP request?

hermanbanken avatar Mar 14 '24 08:03 hermanbanken

We have the same issue where our gRPC Web requests are reported to have %GRPC_STATUS(NUMBER)% = 2. I like the suggestion to have non-gRPC requests report as -1. Can we make that an option? Or are there any workarounds to see if the request was a gRPC or plain HTTP request?

@hermanbanken One of possible workarounds is using following expression (('content-type' in request.headers) && request.headers['content-type'].contains('grpc') && response.grpc_status >= 0) to filter access logs. This works; however, using contains isn't super nice

mprokopjc avatar Mar 14 '24 11:03 mprokopjc

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 13 '24 12:04 github-actions[bot]

What even is the point of stalebot.

dimo414 avatar Apr 15 '24 05:04 dimo414

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 15 '24 08:05 github-actions[bot]

Can someone disable the stalebot on this bug?

dimo414 avatar May 15 '24 17:05 dimo414