Add headers pending flush to audit stats
Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g.
if let Some(first) =
ctx.first_full_headers_flush_fail_time.take()
{
ctx.audit_stats.add_header_flush_duration(
Instant::now().duration_since(first),
);
}
If so, we'd want to take avoid putting self.headers_pending_flush.fetch_sub(1, Ordering::SeqCst); in add_header_flush_duration()
Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g.
if let Some(first) = ctx.first_full_headers_flush_fail_time.take() { ctx.audit_stats.add_header_flush_duration( Instant::now().duration_since(first), ); }If so, we'd want to take avoid putting
self.headers_pending_flush.fetch_sub(1, Ordering::SeqCst);inadd_header_flush_duration()
Reporting a duration on other teardown reasons seems fine to me, but I think we'ld like to also indicate if the flush succeeded or the stream was cancelled. We do not want to mix together the success and failure cases.
Ultimately we want to detect cases where the connection has data to send but cannot make progress. The reasons why we may not be able to make progress include flow control, congestion control or other reasons.
Right. The duration is cumulative across all HEADERS, while being actively blocked can happen only on an individual HEADERS. For example, 103 early hints, 200 OK, trailers. The audit stats already capture if the stream was stopped/reset/fin, apps can use that information in combination with your new metric introduced in the PR to figure out the terminal status.
I think there's a difference between the case that the headers were sent after some delay and the case headers were never sent after a delay. Even if the client action is to reset the stream before FIN is sent.
I agree, that's why I think it would be better to switch the metric proposed here to a bool that indicates if HEADERS were blocked, or not, at the time the stream is terminated.
Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g.
if let Some(first) = ctx.first_full_headers_flush_fail_time.take() { ctx.audit_stats.add_header_flush_duration( Instant::now().duration_since(first), ); }If so, we'd want to take avoid putting
self.headers_pending_flush.fetch_sub(1, Ordering::SeqCst);inadd_header_flush_duration()
Attempted to record blocked time in cases where the stream terminates. I'll look into some more testing on Monday.