[Feature]: Lock-free Metrics Counting API
Related Problems?
The general problem is one of minimizing the performance overheads of instrumentation and delivering on the mission.
#1374
Describe the solution you'd like:
If the API could take advantage of Rust type-system features to distinguish when in a "thread-local" context where things are !Send, then modifying metrics could be even more performant than they are today. i.e. not baking in the assumptions of Arc<Mutex<State>> when Rc<RefCell<State>> would do.
Considered Alternatives
Additional Context
No response
👍 would be great to experiment with a LocalMeter or similar that could expose instruments that are not send nor sync. Could still have them get picked up by global metric readers through channels or other means of communication, would just need to be careful to aggregate them across threads as instruments could have similar names/attributes.
Per-Thread-Aggregators+ something which merges aggregations across the threads would be awesome (I tried implementing it for .NET, the merging-across-threads during export was tricky to implement)
Edit: Ignore this. Not great 2am thought after thinking about it more
Ok here's a crazy thought. Keep in mind I'm still working my way through the code base and OpenTelemetry terminology, so feel free to tell me I'm way off.
Today when you create a Counter, it's ultimately given the Sum::measure() call as it's Measure<T>. Sum::measure() in turn calls it's internal ValueMap::measure(). The ValueMap is ultimately what holds all the attribute sets and their associated value, and the locking mechanism.
Right now we heavily rely on locking because ValueMap is the choke point for aggregation. Bounded instruments can help alleviate this but that only works for stable attributes. So while it may not be completely lock free, what if we can utilize atomics to reduce how often we have to lock?
How does this look in practice? My current thought is along the lines of:
-
Counter<T>is created. This creates it's ownHashMap<AttributeSet, Arc<Atomic<T>>. -
Counter<T>.add()is changed to take a mutable reference to self (meaning you must have an exclusive reference to that specific counter instance in order to calladd()). - Calling
Counter::add()looks up the passed in attribute sets against it's local hash map.- If we have an entry for this attribute set, call
fetch_add()on the atomic - if we do not have an entry for it
- Call
Sum::get_atomic()which callsValueMap::get_atomic() - This acquires a lock, looks up the attribute set in the hashmap, and returns the appropriate atomic (or creates one and returns it if one does not already exist)
- The counter then stores the
Arc<Atomic<T>>in its local hashmap and callsfetch_add()on it.
- Call
- If we have an entry for this attribute set, call
- When the metrics collection process occurs,
Sum::deltaand its other methods are invoked as normal. This will acquire a lock and fetch the value from each atomic in the hashmap - Whenever values need to be reset,
swap()is called on each atomic
There are some internal trait modifications that would need to go into making this work, but I think it follows the current architecture.
While this isn't lock free, Counters only encounter a lock when they have add() invoked with an attribute set they have never seen before. Caching a single Counter<T> per thread makes this interesting. So there could be lock contention early on in the application's life while each counter gets distinct properties, but it should even out eventually.
Likewise, the metrics collection process will lock the whole ValueMap still, but that lock does not prevent incrementing any counters for attribute sets those counters already have.
This also enables some interesting scenarios, such as cleanup. When you drop a Counter, it drops all of it's Arc<Atomic<T> instances it contains. So when metrics collection/reset happens, it can check if the ValueMap's Arc::strong_count == 1. If so then it knows it has the only reference to the atomic for the counter and can clear it from it's list, as it knows the attribute set is currently not used by any counters.
This also dovetails into bounded instruments, because a bounded instrument is just a single counter with a single Arc<Atomic<T>>. So ti has no hashmap lookups and just increments the counter as needed without any locks (and has barely any other logic besides that).
This is all predicated that atomics are performant enough, and that the atomig crate works well (since otherwise we don't have generic atomics).
+lgtm
anytime high perf application such like a proxy or network switch need thread local metrics. And merge all on global collect. even like local lock free tracer?
may same like LocalConuter which impl in tikv-prometheus?