armeria icon indicating copy to clipboard operation
armeria copied to clipboard

Add timing information for acquiring an existing connection

Open jrhee17 opened this issue 2 years ago • 4 comments

Currently, client-side provides some timing information when a new connection is created. https://armeria.dev/docs/advanced-structured-logging#client-connection-timing-properties

This information is exposed with the naming connectionAcquisitionDurationNanos. However, these metrics are recorded only for cases where a new connection is created.

It has been pointed out that the naming is confusing since this metric isn't recorded when an existing connection is used. We may do one of the following:

  1. Add a new timing to ClientConnectionTimings
  • Add a connection timing info
    • e.g. existingConnectionAcquisitionDurationNanos
  • Measure the time to query from the connection pool (https://github.com/line/armeria/blob/b78d9515dc8b4b997dd65787f717443b38a21869/core/src/main/java/com/linecorp/armeria/client/HttpClientDelegate.java#L170-L174))
  • Note that whether the acquisition is successful or not, the duration will be recorded.
  1. Reuse pendingAcquisitionDurationNanos
  • We don't have to worry about naming, but the logic to whether start/end the metric may be complex.
  1. Don't define a separate timing and just end the timings
  • This may be achieved by simply passing the built timings
    • https://github.com/line/armeria/blob/b78d9515dc8b4b997dd65787f717443b38a21869/core/src/main/java/com/linecorp/armeria/client/HttpClientDelegate.java#L176

I propose we go with option 1 might be simpest/least prone to error. I think we also want to expose this metric to RequestMetricSupport by default.

Note: As pointed out by @ikhoon, doing this would also fix a bug where dns timing information wasn't collected if there is a healthy connection already available.

ref: https://line-armeria.slack.com/archives/C1NGPBUH2/p1684469502484689

jrhee17 avatar May 22 '23 06:05 jrhee17

Thanks @jrhee17 for creating this issue!

I agree that option 1 is the most reasonable option. Here are some of my suggestions on option (and please FFTI):

  • existingConnectionAcquisitionDurationNanos is a good name. I'd also suggest renaming connectionAcquisitionDurationNanos to newConnectionAcquisitionDurationNanos. This way, the name is the most clear.
  • I understand that this^ renaming could be a breaking API change as ClientConnectionTimings is probably considered as a public API. So, maybe consider this renaming as part of the next major release. But before that happens, it'd be very nice to update all Javadocs/comments about connectionAcquisitionDurationNanos to clearly point out that this duration is only about acquiring/creating a NEW connection.

Lincong avatar May 22 '23 16:05 Lincong

I'd also suggest renaming connectionAcquisitionDurationNanos to newConnectionAcquisitionDurationNanos. This way, the name is the most clear.

Rather, I was thinking that existingConnectionAcquisitionDurationNanos would be included in connectionAcquisitionDurationNanos.

  1. If a readily existing connection exists, existingConnectionAcquisitionDurationNanos > 0
  2. If a connection is in the progress of creation, pendingAcquisitionDurationNanos > 0
  3. If a connection is created, socketConnectDurationNanos > 0

Essentially, connectionAcquisitionDurationNanos = existingConnectionAcquisitionDurationNanos + pendingAcquisitionDurationNanos + socketConnectDurationNanos so semantically I was thinking the naming is fine as-is.

jrhee17 avatar May 23 '23 01:05 jrhee17

Note: @ikhoon also suggested that we change the naming of the metric collected in RequestMetricSupport: https://github.com/line/armeria/blob/f2116739018c15f230dd8f0e5a4dd5db1fefc888/core/src/main/java/com/linecorp/armeria/internal/common/metric/RequestMetricSupport.java#L289-L296

He suggested that we use tags to differentiate connection acquisition e.g. connection.acquisition.duration

  • acquisition_type
    • new
    • piggyback (pending might be more appropriate)
    • reuse or existing

To avoid breaking changes, we may just add another metric at the cost of extra metric exported. (and deprecate the old metric. Deprecation may be propagated via javadocs and documentation to be added)

We should also update documentation since the relationship between values in ConnectionTimings and the metric names may not be clear.

jrhee17 avatar May 23 '23 01:05 jrhee17

If the default metrics are more accurately expressed/exposed by RequestMetricsSupport, I think it will be possible to reduce the hassle of creating custom metrics by accessing ClientConnectionTimings.

The new metric will provide more accurate information about connection acquisition.

ikhoon avatar May 23 '23 05:05 ikhoon