opentelemetry-go
opentelemetry-go copied to clipboard
Move time.Now call into exemplar reservoir to improve performance
Part of addressing https://github.com/open-telemetry/opentelemetry-go/issues/5542.
Motivation
This removes the time.Now() call from filtered-out Exemplars by only invoking time.Now() after the filtering decision is made. This improvement is especially noticeable for measurements without any attributes.
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: AMD EPYC 7B12
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
SyncMeasure/NoView/Int64Counter/Attributes/0-24 158.20n ± 4% 99.83n ± 1% -36.90% (p=0.000 n=10)
SyncMeasure/NoView/Int64Counter/Attributes/1-24 333.3n ± 4% 274.8n ± 1% -17.55% (p=0.000 n=10)
SyncMeasure/NoView/Int64Counter/Attributes/10-24 1.640µ ± 1% 1.600µ ± 1% -2.41% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/0-24 159.0n ± 3% 101.3n ± 0% -36.27% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/1-24 340.0n ± 2% 272.0n ± 1% -20.00% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/10-24 1.661µ ± 1% 1.597µ ± 0% -3.85% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0-24 159.8n ± 1% 103.1n ± 0% -35.50% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1-24 339.5n ± 1% 273.1n ± 0% -19.57% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10-24 1.656µ ± 0% 1.589µ ± 0% -4.05% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0-24 159.3n ± 2% 100.8n ± 0% -36.74% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1-24 337.9n ± 2% 271.8n ± 1% -19.55% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10-24 1.657µ ± 0% 1.593µ ± 1% -3.83% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/0-24 144.65n ± 4% 89.38n ± 0% -38.21% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/1-24 235.7n ± 2% 183.5n ± 0% -22.15% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/10-24 900.8n ± 1% 836.8n ± 0% -7.10% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/0-24 145.60n ± 5% 93.48n ± 1% -35.80% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/1-24 240.9n ± 1% 183.0n ± 0% -24.06% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/10-24 905.6n ± 1% 826.3n ± 0% -8.76% (p=0.000 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/0-24 20.33n ± 0% 20.35n ± 0% ~ (p=0.302 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/1-24 26.46n ± 0% 26.45n ± 1% ~ (p=0.868 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/10-24 26.50n ± 0% 26.47n ± 0% ~ (p=0.208 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/0-24 20.34n ± 1% 20.27n ± 0% -0.34% (p=0.009 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/1-24 26.55n ± 0% 26.60n ± 1% ~ (p=0.109 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/10-24 26.59n ± 1% 26.57n ± 1% ~ (p=0.926 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/0-24 20.38n ± 1% 20.38n ± 0% ~ (p=0.725 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/1-24 26.39n ± 0% 26.44n ± 0% ~ (p=0.238 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/10-24 26.52n ± 0% 26.42n ± 0% -0.36% (p=0.049 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/0-24 20.30n ± 0% 20.25n ± 0% ~ (p=0.196 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/1-24 26.57n ± 0% 26.54n ± 1% ~ (p=0.540 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/10-24 26.57n ± 0% 26.51n ± 1% ~ (p=0.643 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/0-24 20.37n ± 0% 20.36n ± 1% ~ (p=1.000 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/1-24 26.41n ± 0% 26.50n ± 0% +0.32% (p=0.007 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/10-24 26.44n ± 0% 26.55n ± 1% +0.42% (p=0.012 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/0-24 20.30n ± 0% 20.45n ± 0% +0.74% (p=0.000 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/1-24 26.52n ± 0% 26.48n ± 0% ~ (p=0.127 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/10-24 26.55n ± 0% 26.48n ± 0% -0.26% (p=0.002 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/0-24 170.5n ± 2% 110.8n ± 0% -35.03% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/1-24 402.5n ± 1% 331.5n ± 1% -17.64% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/10-24 1.363µ ± 1% 1.281µ ± 1% -6.02% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/0-24 170.6n ± 1% 111.5n ± 1% -34.64% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/1-24 397.1n ± 1% 335.9n ± 0% -15.41% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/10-24 1.371µ ± 1% 1.279µ ± 1% -6.71% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/0-24 170.1n ± 1% 112.2n ± 0% -34.09% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/1-24 397.5n ± 1% 330.2n ± 0% -16.93% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/10-24 1.371µ ± 1% 1.289µ ± 1% -5.95% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/0-24 171.4n ± 2% 112.9n ± 0% -34.13% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/1-24 397.0n ± 3% 336.4n ± 0% -15.24% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/10-24 1.383µ ± 1% 1.305µ ± 1% -5.61% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Histogram/Attributes/0-24 157.30n ± 2% 98.58n ± 1% -37.33% (p=0.000 n=6+10)
Changes
- Introduce
exemplar.Filter, which is a filter function based on the context. It will not be user-facing, so we can always add other parameters later if needed. - Introduce
exemplar.FilteredReservoir, which is similar to a reservoir, except it does not receive a timestamp. It gets the current time after the filter decision has been made. It uses generics to avoid the call to exemplar.NewValue(), since it is internal-only. - The
exemplar.Reservoiris left as-is, so that it can be made public when exemplars are stable. It still includes a timestamp argument. - Unit tests are updated to expect a much lower number of calls to time.Now
exemplar.Dropis now anexemplar.FilteredReservoirinstead of aReservoir, since we don't need a Reservoir to store things in if the measurement is always dropped.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 84.4%. Comparing base (
1cefb17) to head (93ec4fa).
Additional details and impacted files
@@ Coverage Diff @@
## main #5545 +/- ##
=======================================
- Coverage 84.4% 84.4% -0.1%
=======================================
Files 270 271 +1
Lines 22336 22333 -3
=======================================
- Hits 18871 18868 -3
Misses 3134 3134
Partials 331 331
The timestamp arg is something recommended by the specification: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplarreservoir
This may be necessary now, but when we accept a user defined reservoir this seems like an important parameter.
Would passing a zero value time from the SDK and check for this zero value in the reservoir to then call now be a better approach?
I changed it to check for time.IsZero()
Java doesn't appear to accept a timestamp (although it looks like it is internal): https://github.com/open-telemetry/opentelemetry-java/blob/0aacc55d1e3f5cc6dbb4f8fa26bcb657b01a7bc9/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/ExemplarReservoir.java#L95, but JS does: https://github.com/open-telemetry/opentelemetry-js/blob/f99e7d9c2c37376e752f746308affa74fc7c30d2/packages/sdk-metrics/src/exemplar/ExemplarReservoir.ts#L33.
@MrAlias I was re-reading the specification, and I think the intent is that Offer is only called when the exemplar is definitely going to be recorded (i.e. after the filter). We should probably have one Offer to the (non-public) FilteredReservoir which does not have a timestamp argument, and one Offer to the (to-be-public) Reservoir which does have the timestamp argument.
We should probably have one Offer to the (non-public) FilteredReservoir which does not have a timestamp argument, and one Offer to the (to-be-public) Reservoir which does have the timestamp argument.
I'm not sure I fully understand this implementation, but in general this sounds correct to me.
Does that mean we would keep the existing Offer method and just change the FilteredReservoir in this PR?
Does that mean we would keep the existing Offer method and just change the FilteredReservoir in this PR?
Yes.
Please let me know if this is too large to review or too large for a last-minute change before a release. I can try to make a version which just fixes the issue with as small of a diff as possible.