opentelemetry-dotnet
opentelemetry-dotnet copied to clipboard
Metric AggregatorStore improvements
Opening an issue to track future improvements.
The current AggStore uses 2 data structures to find the MetricPoint. One is an pre-allocated array of MetricPoint, where the actual points are stored, and then a concurrentdictionary (2 level), to find the index within the array for a given keys/values combinations.
- The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.
- Replace the inner concurrent dictionary with hashtable to avoid dual lock on writes - https://github.com/open-telemetry/opentelemetry-dotnet/pull/2339#discussion_r706997727
- Design a totally new data structure specialized for the purpose.
- Special case 0 tags - see discussion : https://github.com/open-telemetry/opentelemetry-dotnet/pull/2339#discussion_r711217242
Given this is purely internal implementation without any public API, this can be addressed after other high priority asks.
- The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.
I like this design. However, there might be a bug here. Let's say we add the first measurement of A with attributes of (c = 3, b=2, a=1), then another one B with attributes of (c=3, a=1, b=2). Based on the description above, the second lookup will fail. And then we sort the attributes of B and get (a=1,b=2,c=3). Note that it's not the same as the attributes of A, and we have to create a new metric here, which is obviously not what we want.
- The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.
I like this design. However, there might be a bug here. Let's say we add the first measurement of
Awith attributes of(c = 3, b=2, a=1), then another oneBwith attributes of(c=3, a=1, b=2). Based on the description above, the second lookup will fail. And then we sort the attributes ofBand get(a=1,b=2,c=3). Note that it's not the same as the attributes ofA, and we have to create a new metric here, which is obviously not what we want.
Not sure I fully understood the bug. When (c = 3, b=2, a=1) is encountered, its lookup will fail. Then its sorted an another lookup is done, which also fails. Then both original order and sorted order is inserted to dictionary. Next, (c=3, a=1, b=2) comes. Its initial lookup fails. Then sorted lookup occurs, which succeeds.
(btw, this is not implemented yet)
... Then both original order and sorted order is inserted to dictionary...
Now I see. Thanks for your explanation. I thought only the original order is inserted before.
@utpilla is looking at implementing the item 1 to avoid sorting costs.
The most critical improvements are already shipped as part of 1.2 itself. Removing this from 1.3 milestone, as it can occur anytime. (1.3 is releasing in May 2022 with other improvements)