opentelemetry-rust
opentelemetry-rust copied to clipboard
Add API support for bound instruments
Fixes # Design discussion issue (if applicable) #1374
Changes
This PR adds API support for bound instruments for Counter
and UpDownCounter
. Bound instruments are counters which always change the value for a specific and predefined attribute set. The focus on this change is creating the initial API surface for bound instruments, and the mechanisms to allow non-bound instruments to become bound (since there is no direct path from aggregates to counters, it's all through closures).
To facilitate counters creating bound instruments that can refer back to their source aggregates, aggregates return a BoundMeasureGenerator<T>
closure that is provided alongside the Measure<T>
for the aggregate. This closure can then be invoked in order to return a BoundMeasure<T>
, which provides the mechanism to invoke a measure for the cached attribute set.
This PR is not performance focused. It should not cause regressions but using bound instruments from this PR may not cause increased throughput. Performance enhancements that take advantage of bound instruments will come in a follow-up PR once we have an agreed upon API and generation path.
Benchmarks/Stress Tests
Main
Counter_Add_Sorted time: [717.94 ns 723.06 ns 729.17 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
Counter_Add_Unsorted time: [711.21 ns 714.89 ns 719.15 ns]
Found 12 outliers among 100 measurements (12.00%)
8 (8.00%) high mild
4 (4.00%) high severe
stress test: metrics
Number of threads: 6
Throughput: 6,367,800 iterations/sec
Throughput: 6,581,800 iterations/sec
Throughput: 6,736,800 iterations/sec
Throughput: 6,720,200 iterations/sec
Throughput: 6,804,600 iterations/sec
PR
Counter_Add_Sorted time: [712.98 ns 715.33 ns 717.99 ns]
change: [-0.7779% +0.3569% +1.5923%] (p = 0.60 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Counter_Add_Unsorted time: [705.13 ns 707.71 ns 710.82 ns]
change: [-2.0962% -0.9670% +0.0233%] (p = 0.09 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
Counter_Bounded time: [133.96 ns 134.82 ns 135.86 ns]
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
stress test: metrics
Number of threads: 6
Throughput: 6,173,200 iterations/sec
Throughput: 6,381,600 iterations/sec
Throughput: 6,250,000 iterations/sec
Throughput: 6,434,000 iterations/sec
stress test: metrics_bounded
Number of threads: 6
Throughput: 6,473,400 iterations/sec
Throughput: 6,459,600 iterations/sec
Throughput: 6,339,200 iterations/sec
Throughput: 6,369,200 iterations/sec
Merge requirement checklist
- [X] CONTRIBUTING guidelines followed
- [X] Unit tests added/updated (if applicable)
- [ ] Appropriate
CHANGELOG.md
files updated for non-trivial, user-facing changes - [ ] Changes in public API reviewed (if applicable)
The api will change slightly when/if #1421 get in to take an attribute set instead of a slice of key value pairs, but I didn't want to couple both together. Bounded performance might end up getting a bit better since cloning()
becomes cheaper, but performance isn't the point of this PR :)
Codecov Report
Attention: 73 lines
in your changes are missing coverage. Please review.
Comparison is base (
8688e7e
) 61.8% compared to head (cd3be86
) 62.2%. Report is 4 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1444 +/- ##
=======================================
+ Coverage 61.8% 62.2% +0.3%
=======================================
Files 144 144
Lines 19810 20152 +342
=======================================
+ Hits 12256 12538 +282
- Misses 7554 7614 +60
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Like the general direction, kinda surprised not to see much of a performance gain over standard counters. I feel like (not related to this change) we have a lot of Arc spread through the code that maybe we could avoid. A bit of locking our locks so we can lock more locks.
Arc
contains no locking. It's an atomic number that tracks how many references there are. When you clone()
it increases the atomic count, and when you drop()
it decrements the atomic count (and if it's the last reference it drops the underlying value). There's no locking involved in cloning, dropping, or accessing an Arc
, and a lock is only involved if you access an internal function of the held type that invokes a lock (e.g. a mutex held by the arc).
The reason this doesn't provide any performance benefits is partially because #1421 is not yet merged. Since we don't have that change in, we must do a hard clone of the attribute set every time a bound measure is called (and for each measure in the ResolvedMeasure
. Without #1421 it's cloning the whole vector and doing extra allocations.
There's no performance benefit too because this PR has bound instruments as a thin wrapper around normal measurements. Once this PR is merged, the next PR fully takes advantage of bound instruments by creating an atomic that's incremented by the bound measure function. This allows the BoundCounter.add()
calls to increment with zero mutex locking and zero hashmap lookups. That's what this PR is setting the stage for.
@cijothomas Can we close this PR if you plan to open a new PR for bound instruments inline with the changes here?
@cijothomas Can we close this PR if you plan to open a new PR for bound instruments inline with the changes here?
Yes closing this. Cross-linking to https://github.com/open-telemetry/opentelemetry-rust/issues/1374, so its is easy to find the work in this PR to adopt.