opentelemetry-go icon indicating copy to clipboard operation
opentelemetry-go copied to clipboard

Move time.Now call into exemplar reservoir to improve performance

Open dashpole opened this issue 1 year ago • 8 comments

Part of addressing https://github.com/open-telemetry/opentelemetry-go/issues/5542.

Motivation

This removes the time.Now() call from filtered-out Exemplars by only invoking time.Now() after the filtering decision is made. This improvement is especially noticeable for measurements without any attributes.

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: AMD EPYC 7B12
                                                                 │   old.txt    │                new.txt                │
                                                                 │    sec/op    │   sec/op     vs base                  │
SyncMeasure/NoView/Int64Counter/Attributes/0-24                    158.20n ± 4%   99.83n ± 1%  -36.90% (p=0.000 n=10)
SyncMeasure/NoView/Int64Counter/Attributes/1-24                     333.3n ± 4%   274.8n ± 1%  -17.55% (p=0.000 n=10)
SyncMeasure/NoView/Int64Counter/Attributes/10-24                    1.640µ ± 1%   1.600µ ± 1%   -2.41% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/0-24                   159.0n ± 3%   101.3n ± 0%  -36.27% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/1-24                   340.0n ± 2%   272.0n ± 1%  -20.00% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/10-24                  1.661µ ± 1%   1.597µ ± 0%   -3.85% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0-24               159.8n ± 1%   103.1n ± 0%  -35.50% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1-24               339.5n ± 1%   273.1n ± 0%  -19.57% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10-24              1.656µ ± 0%   1.589µ ± 0%   -4.05% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0-24             159.3n ± 2%   100.8n ± 0%  -36.74% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1-24             337.9n ± 2%   271.8n ± 1%  -19.55% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10-24            1.657µ ± 0%   1.593µ ± 1%   -3.83% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/0-24                  144.65n ± 4%   89.38n ± 0%  -38.21% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/1-24                   235.7n ± 2%   183.5n ± 0%  -22.15% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/10-24                  900.8n ± 1%   836.8n ± 0%   -7.10% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/0-24                145.60n ± 5%   93.48n ± 1%  -35.80% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/1-24                 240.9n ± 1%   183.0n ± 0%  -24.06% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/10-24                905.6n ± 1%   826.3n ± 0%   -8.76% (p=0.000 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/0-24                   20.33n ± 0%   20.35n ± 0%        ~ (p=0.302 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/1-24                   26.46n ± 0%   26.45n ± 1%        ~ (p=0.868 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/10-24                  26.50n ± 0%   26.47n ± 0%        ~ (p=0.208 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/0-24                 20.34n ± 1%   20.27n ± 0%   -0.34% (p=0.009 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/1-24                 26.55n ± 0%   26.60n ± 1%        ~ (p=0.109 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/10-24                26.59n ± 1%   26.57n ± 1%        ~ (p=0.926 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/0-24             20.38n ± 1%   20.38n ± 0%        ~ (p=0.725 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/1-24             26.39n ± 0%   26.44n ± 0%        ~ (p=0.238 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/10-24            26.52n ± 0%   26.42n ± 0%   -0.36% (p=0.049 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/0-24           20.30n ± 0%   20.25n ± 0%        ~ (p=0.196 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/1-24           26.57n ± 0%   26.54n ± 1%        ~ (p=0.540 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/10-24          26.57n ± 0%   26.51n ± 1%        ~ (p=0.643 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/0-24                 20.37n ± 0%   20.36n ± 1%        ~ (p=1.000 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/1-24                 26.41n ± 0%   26.50n ± 0%   +0.32% (p=0.007 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/10-24                26.44n ± 0%   26.55n ± 1%   +0.42% (p=0.012 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/0-24               20.30n ± 0%   20.45n ± 0%   +0.74% (p=0.000 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/1-24               26.52n ± 0%   26.48n ± 0%        ~ (p=0.127 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/10-24              26.55n ± 0%   26.48n ± 0%   -0.26% (p=0.002 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/0-24             170.5n ± 2%   110.8n ± 0%  -35.03% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/1-24             402.5n ± 1%   331.5n ± 1%  -17.64% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/10-24            1.363µ ± 1%   1.281µ ± 1%   -6.02% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/0-24           170.6n ± 1%   111.5n ± 1%  -34.64% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/1-24           397.1n ± 1%   335.9n ± 0%  -15.41% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/10-24          1.371µ ± 1%   1.279µ ± 1%   -6.71% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/0-24       170.1n ± 1%   112.2n ± 0%  -34.09% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/1-24       397.5n ± 1%   330.2n ± 0%  -16.93% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/10-24      1.371µ ± 1%   1.289µ ± 1%   -5.95% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/0-24     171.4n ± 2%   112.9n ± 0%  -34.13% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/1-24     397.0n ± 3%   336.4n ± 0%  -15.24% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/10-24    1.383µ ± 1%   1.305µ ± 1%   -5.61% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Histogram/Attributes/0-24          157.30n ± 2%   98.58n ± 1%  -37.33% (p=0.000 n=6+10)

Changes

  • Introduce exemplar.Filter, which is a filter function based on the context. It will not be user-facing, so we can always add other parameters later if needed.
  • Introduce exemplar.FilteredReservoir, which is similar to a reservoir, except it does not receive a timestamp. It gets the current time after the filter decision has been made. It uses generics to avoid the call to exemplar.NewValue(), since it is internal-only.
  • The exemplar.Reservoir is left as-is, so that it can be made public when exemplars are stable. It still includes a timestamp argument.
  • Unit tests are updated to expect a much lower number of calls to time.Now
  • exemplar.Drop is now an exemplar.FilteredReservoir instead of a Reservoir, since we don't need a Reservoir to store things in if the measurement is always dropped.

dashpole avatar Jun 25 '24 19:06 dashpole

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.4%. Comparing base (1cefb17) to head (93ec4fa).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #5545     +/-   ##
=======================================
- Coverage   84.4%   84.4%   -0.1%     
=======================================
  Files        270     271      +1     
  Lines      22336   22333      -3     
=======================================
- Hits       18871   18868      -3     
  Misses      3134    3134             
  Partials     331     331             

see 8 files with indirect coverage changes

codecov[bot] avatar Jun 25 '24 19:06 codecov[bot]

The timestamp arg is something recommended by the specification: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplarreservoir

This may be necessary now, but when we accept a user defined reservoir this seems like an important parameter.

Would passing a zero value time from the SDK and check for this zero value in the reservoir to then call now be a better approach?

MrAlias avatar Jun 25 '24 20:06 MrAlias

I changed it to check for time.IsZero()

dashpole avatar Jun 25 '24 21:06 dashpole

Java doesn't appear to accept a timestamp (although it looks like it is internal): https://github.com/open-telemetry/opentelemetry-java/blob/0aacc55d1e3f5cc6dbb4f8fa26bcb657b01a7bc9/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/ExemplarReservoir.java#L95, but JS does: https://github.com/open-telemetry/opentelemetry-js/blob/f99e7d9c2c37376e752f746308affa74fc7c30d2/packages/sdk-metrics/src/exemplar/ExemplarReservoir.ts#L33.

dashpole avatar Jun 25 '24 21:06 dashpole

@MrAlias I was re-reading the specification, and I think the intent is that Offer is only called when the exemplar is definitely going to be recorded (i.e. after the filter). We should probably have one Offer to the (non-public) FilteredReservoir which does not have a timestamp argument, and one Offer to the (to-be-public) Reservoir which does have the timestamp argument.

dashpole avatar Jun 26 '24 15:06 dashpole

We should probably have one Offer to the (non-public) FilteredReservoir which does not have a timestamp argument, and one Offer to the (to-be-public) Reservoir which does have the timestamp argument.

I'm not sure I fully understand this implementation, but in general this sounds correct to me.

Does that mean we would keep the existing Offer method and just change the FilteredReservoir in this PR?

MrAlias avatar Jun 26 '24 16:06 MrAlias

Does that mean we would keep the existing Offer method and just change the FilteredReservoir in this PR?

Yes.

dashpole avatar Jun 26 '24 16:06 dashpole

Please let me know if this is too large to review or too large for a last-minute change before a release. I can try to make a version which just fixes the issue with as small of a diff as possible.

dashpole avatar Jun 28 '24 14:06 dashpole