opentelemetry-dotnet Metric AggregatorStore improvements

Opening an issue to track future improvements.

The current AggStore uses 2 data structures to find the MetricPoint. One is an pre-allocated array of MetricPoint, where the actual points are stored, and then a concurrentdictionary (2 level), to find the index within the array for a given keys/values combinations.

The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.
Replace the inner concurrent dictionary with hashtable to avoid dual lock on writes - https://github.com/open-telemetry/opentelemetry-dotnet/pull/2339#discussion_r706997727
Design a totally new data structure specialized for the purpose.
Special case 0 tags - see discussion : https://github.com/open-telemetry/opentelemetry-dotnet/pull/2339#discussion_r711217242

Given this is purely internal implementation without any public API, this can be addressed after other high priority asks.

Sep 17 '21 18:09 cijothomas

The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.

I like this design. However, there might be a bug here. Let's say we add the first measurement of A with attributes of (c = 3, b=2, a=1), then another one B with attributes of (c=3, a=1, b=2). Based on the description above, the second lookup will fail. And then we sort the attributes of B and get (a=1,b=2,c=3). Note that it's not the same as the attributes of A, and we have to create a new metric here, which is obviously not what we want.

Oct 29 '21 05:10 findmyway

The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.

I like this design. However, there might be a bug here. Let's say we add the first measurement of A with attributes of (c = 3, b=2, a=1), then another one B with attributes of (c=3, a=1, b=2). Based on the description above, the second lookup will fail. And then we sort the attributes of B and get (a=1,b=2,c=3). Note that it's not the same as the attributes of A, and we have to create a new metric here, which is obviously not what we want.

Not sure I fully understood the bug. When (c = 3, b=2, a=1) is encountered, its lookup will fail. Then its sorted an another lookup is done, which also fails. Then both original order and sorted order is inserted to dictionary. Next, (c=3, a=1, b=2) comes. Its initial lookup fails. Then sorted lookup occurs, which succeeds.

(btw, this is not implemented yet)

Oct 29 '21 14:10 cijothomas

... Then both original order and sorted order is inserted to dictionary...

Now I see. Thanks for your explanation. I thought only the original order is inserted before.

Oct 29 '21 15:10 findmyway

@utpilla is looking at implementing the item 1 to avoid sorting costs.

Jan 10 '22 22:01 cijothomas

The most critical improvements are already shipped as part of 1.2 itself. Removing this from 1.3 milestone, as it can occur anytime. (1.3 is releasing in May 2022 with other improvements)

May 24 '22 04:05 cijothomas

opentelemetry-dotnet opentelemetry-dotnet copied to clipboard

Metric AggregatorStore improvements

opentelemetry-dotnet
opentelemetry-dotnet copied to clipboard