envoy
envoy copied to clipboard
stream_info: add downstream handshake completion timing
Signed-off-by: Paul Sohn [email protected]
Commit Message: Adds a downstreamTiming field to the connection-level streamInfo for when the TLS handshake completed for HTTP 1/2/3, for monitoring handshake latency. Additional Description: Note that for HTTP/3 zero-RTT, the handshake may complete well into the life of the request, which is why this field is not copied into the stream-level streamInfo when a stream is created. Risk Level: Low Testing: Integration tests included in the PR. Docs Changes: None Release Notes: Added
As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
@pksohn this needs some formatting fixes
you can use the diff/artefacts here to fix
https://dev.azure.com/cncf/envoy/_build/results?buildId=113538&view=results
@ggreenway the checks are finally passing for this PR. Any chance you might be able to take a look? Thanks!
@ggreenway @alyssawilk One thing I meant to ask about:
I removed the assertion that the handshake timestamp can only be written once, in this commit: https://github.com/envoyproxy/envoy/pull/22367/commits/61a8618bc246063886be9eba6ecbd4e19521b57e, upon discovering that the SslSocketTest::ClientSessionResumption... tests were failing due to SslSocket::onSuccess being called multiple times.
In practice do we actually expect SslSocket::onSuccess to be called multiple times for the same connection (e.g. on resumption)? If so, would we want the handshake-complete timestamp to be overwritten?
@ggreenway @alyssawilk One thing I meant to ask about:
I removed the assertion that the handshake timestamp can only be written once, in this commit: 61a8618, upon discovering that the SslSocketTest::ClientSessionResumption... tests were failing due to SslSocket::onSuccess being called multiple times.
In practice do we actually expect SslSocket::onSuccess to be called multiple times for the same connection (e.g. on resumption)? If so, would we want the handshake-complete timestamp to be overwritten?
Offline with @alyssawilk: we don't expect this to happen in production; likely just an artifact of the SslSocketTest::ClientSessionResumption... tests, removing the assertion is fine.
@ggreenway ping!
Would you be willing to add an access log formatter in source/common/formatter/substitutions_formatter.cc, at least for tcp-proxy access logs? I think this new field is generally useful so I'd like to make it more easily available.
Thanks -- I think I've added what you were suggesting.
@pksohn can you fix CI? Then I'll ping the reviewers again to try to make some progress.
/wait
/wait on coverage
/retest
Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit
also @ggreenway can you take a look? I think I signed off on this one but not sure if your concerns are all resolved
@alyssawilk: there was a question above that we were hoping you could weigh in on (namely whether to move this metric into the SSL connection or leave it in downstream timing): https://github.com/envoyproxy/envoy/pull/22367#discussion_r941792000
I think you need to merge main to pick up #22792