opentelemetry-rust [Feature]: Lock-free Metrics Counting API

Describe the solution you'd like:

If the API could take advantage of Rust type-system features to distinguish when in a "thread-local" context where things are !Send, then modifying metrics could be even more performant than they are today. i.e. not baking in the assumptions of Arc<Mutex<State>> when Rc<RefCell<State>> would do.

Considered Alternatives

Additional Context

No response

Nov 21 '23 17:11 shaun-cox

👍 would be great to experiment with a LocalMeter or similar that could expose instruments that are not send nor sync. Could still have them get picked up by global metric readers through channels or other means of communication, would just need to be careful to aggregate them across threads as instruments could have similar names/attributes.

Nov 21 '23 17:11 jtescher

Per-Thread-Aggregators+ something which merges aggregations across the threads would be awesome (I tried implementing it for .NET, the merging-across-threads during export was tricky to implement)

Nov 21 '23 18:11 cijothomas

Edit: Ignore this. Not great 2am thought after thinking about it more

Ok here's a crazy thought. Keep in mind I'm still working my way through the code base and OpenTelemetry terminology, so feel free to tell me I'm way off.

Today when you create a Counter, it's ultimately given the Sum::measure() call as it's Measure<T>. Sum::measure() in turn calls it's internal ValueMap::measure(). The ValueMap is ultimately what holds all the attribute sets and their associated value, and the locking mechanism.

Right now we heavily rely on locking because ValueMap is the choke point for aggregation. Bounded instruments can help alleviate this but that only works for stable attributes. So while it may not be completely lock free, what if we can utilize atomics to reduce how often we have to lock?

How does this look in practice? My current thought is along the lines of:

Counter<T> is created. This creates it's own HashMap<AttributeSet, Arc<Atomic<T>>.
Counter<T>.add() is changed to take a mutable reference to self (meaning you must have an exclusive reference to that specific counter instance in order to call add()).
Calling Counter::add() looks up the passed in attribute sets against it's local hash map.
- If we have an entry for this attribute set, call fetch_add() on the atomic
- if we do not have an entry for it
  - Call Sum::get_atomic() which calls ValueMap::get_atomic()
  - This acquires a lock, looks up the attribute set in the hashmap, and returns the appropriate atomic (or creates one and returns it if one does not already exist)
  - The counter then stores the Arc<Atomic<T>> in its local hashmap and calls fetch_add() on it.
When the metrics collection process occurs, Sum::delta and its other methods are invoked as normal. This will acquire a lock and fetch the value from each atomic in the hashmap
Whenever values need to be reset, swap() is called on each atomic

There are some internal trait modifications that would need to go into making this work, but I think it follows the current architecture.

While this isn't lock free, Counters only encounter a lock when they have add() invoked with an attribute set they have never seen before. Caching a single Counter<T> per thread makes this interesting. So there could be lock contention early on in the application's life while each counter gets distinct properties, but it should even out eventually.

Likewise, the metrics collection process will lock the whole ValueMap still, but that lock does not prevent incrementing any counters for attribute sets those counters already have.

This also enables some interesting scenarios, such as cleanup. When you drop a Counter, it drops all of it's Arc<Atomic<T> instances it contains. So when metrics collection/reset happens, it can check if the ValueMap's Arc::strong_count == 1. If so then it knows it has the only reference to the atomic for the counter and can clear it from it's list, as it knows the attribute set is currently not used by any counters.

This also dovetails into bounded instruments, because a bounded instrument is just a single counter with a single Arc<Atomic<T>>. So ti has no hashmap lookups and just increments the counter as needed without any locks (and has barely any other logic besides that).

This is all predicated that atomics are performant enough, and that the atomig crate works well (since otherwise we don't have generic atomics).

Nov 22 '23 07:11 KallDrexx

+lgtm

anytime high perf application such like a proxy or network switch need thread local metrics. And merge all on global collect. even like local lock free tracer?

Dec 15 '23 03:12 fly3366

may same like LocalConuter which impl in tikv-prometheus?

Dec 15 '23 03:12 fly3366

[Feature]: Lock-free Metrics Counting API

Related Problems?

Describe the solution you'd like:

Considered Alternatives

Additional Context