proxy_latency_response_out is always equal to DURATION time of the request
Gloo Edge Product
Enterprise
Gloo Edge Version
1.15.x
Kubernetes Version
1.26
Describe the bug
Testing with proxyLatency response_out, I observed that
proxy_latency_response_out: '%DYNAMIC_METADATA(io.solo.filters.http.proxy_latency:response_out)%'
Is always equal to:
DURATION: '%DURATION%'
Expected Behavior
For me, latency for response_out should only measure the latency on response time, not having the same value as duration of the total request.
Steps to reproduce the bug
Gateway object:
apiVersion: gateway.solo.io/v1
kind: Gateway
metadata:
annotations:
labels:
app: gloo
name: gateway-proxy
namespace: gloo-system
spec:
bindAddress: '::'
bindPort: 8080
httpGateway:
options:
proxyLatency:
measureRequestInternally: true
request: LAST_INCOMING_LAST_OUTGOING
response: LAST_INCOMING_LAST_OUTGOING
options:
accessLoggingService:
accessLog:
- fileSink:
jsonFormat:
DURATION: '%DURATION%'
proxy_latency_response_out: '%DYNAMIC_METADATA(io.solo.filters.http.proxy_latency:response_out)%'
path: /dev/stdout
proxyNames:
- gateway-proxy
ssl: false
useProxyProto: false
Inject some fault injection on request:
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
name: vs
namespace: gloo-system
spec:
virtualHost:
domains:
- '*'
routes:
- matchers:
- prefix: /
options:
faults:
delay:
fixedDelay: 2s
percentage: 100
routeAction:
single:
upstream:
name: httpbin-httpbin-8000
namespace: gloo-system
As you see, I have a delay of 2s on the request, however, if I check my logs:
{"DURATION":2006,"proxy_latency_response_out":2006}
proxy_latency_response_out is always taking the total response time
Additional Environment Detail
No response
Additional Context
No response
┆Issue is synchronized with this Asana task by Unito
Reproducer project: https://github.com/DuncanDoyle/ge-gloo-9317
Proxy latency definitely needs more docs.
However this is what it does: it gets the total time in the proxy so that it can report what the overhead of using the proxy is.
In this case the reproduction steps involve setting a delay via the fault filter https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/fault_filter which can do things like simulate a delay on a request.
Unfortunately in this case we have the following issue fault filter is technically in envoy and therefore counts towards proxy latency filters time.
EDIT: removing a section previously here about nighthawk. While it had mostly correct info it actually had a piece that was not quite right. Clarifying comment below.
Reopened for two parts.
Issue as stated here: DURATION versus request_out how are they different. Today the dynamic metadata of request_out is recorded as the first byte sent to the downstream while DURATION is when the whole requestresponse is done. This means that the difference in the two is minimal.
Gap: Proxy latency filter today emits 2 types of artifacts. It emits 3 fields in DYNAMIC_METADATA which are mainly for internal use. These are as follows 1.response_out: the time when the first byte is sent downstream 2. request_out: the time when the first byte is sent upstream 3.request_out_internal: the time until the first byte of headers is processed Proxy latency filter also emits STATS as an artifact: These are proxy_latency.rq_proxy_latency and proxy_latency.resp_proxy_latency the exact value of these is tunable via configuration as found here https://docs.solo.io/gloo-edge/latest/reference/api/github.com/solo-io/gloo/projects/gloo/api/external/envoy/extensions/proxylatency/proxylatency.proto.sk/#measurement
Possible follow ups:
- Improve documentation on this in main docs
- Expose the stats as metadata as well
After further investigation it seems like we should mainly push users to recreate these stats if needed in metadata. Most of the possible setups we have with stats can be derived via the current metadata from this along with the upstream operators REQUEST_TX_DURATION RESPONSE_TX_DURATION
Put up PR for this.
Includes some additional documentation around how this plays with commun_duration https://github.com/envoyproxy/envoy/pull/33240