vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Add metric that matches the duration in access log

Open jobergum opened this issue 1 year ago • 3 comments

Due to https://github.com/vespa-engine/vespa/issues/26408, the duration logged in the access log can be considerably higher than query_latency, also because the access log duration field includes the time it takes to render the response.

I think that we should have a metric that corresponds 100% with the duration logged to the access log, bonus for percentile calculations.

jobergum avatar Mar 13 '23 10:03 jobergum

Agreed, and the container.handled.latency.sum metric with the correct handler tag should be a much closer match here. There is no percentiles for this, though, but I believe we should rather use histograms for than percentiles going forwards.

yngveaasheim avatar Mar 13 '23 11:03 yngveaasheim

https://github.com/vespa-engine/vespa/pull/27120 changes StatisticsSearcher to use the correct request timestamp. Previous timestamp did not account for initial request processing in Jetty.

The handled.latency includes the latency up to the start of the response, not including the time for producing and sending the response content. The metric serverTotalSuccessfulResponseLatency/serverTotalFailedResponseLatency are the only latency metrics that matches the duration from the access log.

bjorncs avatar May 15 '23 13:05 bjorncs

Resetting priority. This is part of larger effort to evaluate all existing container metrics.

bjorncs avatar Jun 27 '23 13:06 bjorncs