Improve performance of MeterAdapter
(sorry for an empty email message, I accidentally pressed enter in the title)
The MeterAdapter is quite slow compared to using prometheus-net directly, I noticed that especially in a multi threaded scenario - it isn't that hard to hammer one metric in parallel on a many core machine. This PR adds benchmarks to demonstrate the improvement, and improves performance both in the single-threaded and the multi-threaded case.
Main changes are
_options.ResolveHistogramBucketsis not called on each observation. It is not cached together with the rest of metric initialization code.- The locks are removed from
LifetimeManager
Other details should be described in the commit messages or comments.
Before
| Method | MeasurementCount | ThreadCount | TargetMetricType | Mean | Error | StdDev | Gen0 | Completed Work Items | Lock Contentions | Gen1 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|---|
| MeasurementPerformance | 200000 | 1 | CounterInt | 101.2 ms | 0.40 ms | 0.80 ms | 9000.0000 | - | - | - | 158.69 MB |
| MeasurementPerformance | 200000 | 1 | CounterFloat | 102.8 ms | 0.69 ms | 1.35 ms | 3000.0000 | - | - | - | 158.69 MB |
| MeasurementPerformance | 200000 | 1 | HistogramInt | 154.2 ms | 1.36 ms | 2.71 ms | 4000.0000 | - | - | - | 196.84 MB |
| MeasurementPerformance | 200000 | 1 | HistogramFloat | 151.8 ms | 1.15 ms | 2.27 ms | 4000.0000 | - | - | - | 196.84 MB |
| MeasurementPerformance | 200000 | 16 | CounterInt | 1,438.3 ms | 39.26 ms | 165.81 ms | 159000.0000 | - | 89825.0000 | 1000.0000 | 2539.07 MB |
| MeasurementPerformance | 200000 | 16 | CounterFloat | 1,474.8 ms | 29.78 ms | 124.11 ms | 53000.0000 | - | 56599.0000 | - | 2539.07 MB |
| MeasurementPerformance | 200000 | 16 | HistogramInt | 1,769.7 ms | 35.19 ms | 115.24 ms | 66000.0000 | - | 74652.0000 | - | 3149.42 MB |
| MeasurementPerformance | 200000 | 16 | HistogramFloat | 1,725.7 ms | 33.78 ms | 112.99 ms | 66000.0000 | - | 62252.0000 | - | 3149.42 MB |
After
| Method | MeasurementCount | ThreadCount | TargetMetricType | Mean | Error | StdDev | Completed Work Items | Lock Contentions | Gen0 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| MeasurementPerformance | 200000 | 1 | CounterInt | 37.89 ms | 0.041 ms | 0.080 ms | - | - | - | 27.47 MB |
| MeasurementPerformance | 200000 | 1 | CounterFloat | 38.83 ms | 0.267 ms | 0.520 ms | - | - | 1000.0000 | 27.47 MB |
| MeasurementPerformance | 200000 | 1 | HistogramInt | 48.21 ms | 0.178 ms | 0.347 ms | - | - | 2000.0000 | 33.57 MB |
| MeasurementPerformance | 200000 | 1 | HistogramFloat | 47.69 ms | 0.125 ms | 0.246 ms | - | - | - | 33.57 MB |
| MeasurementPerformance | 200000 | 16 | CounterInt | 881.51 ms | 29.158 ms | 123.458 ms | - | 13.0000 | 27000.0000 | 439.46 MB |
| MeasurementPerformance | 200000 | 16 | CounterFloat | 906.75 ms | 27.166 ms | 115.024 ms | - | 10.0000 | 9000.0000 | 439.46 MB |
| MeasurementPerformance | 200000 | 16 | HistogramInt | 928.74 ms | 30.498 ms | 129.131 ms | - | 16.0000 | 33000.0000 | 537.11 MB |
| MeasurementPerformance | 200000 | 16 | HistogramFloat | 851.88 ms | 30.896 ms | 130.815 ms | - | 15.0000 | 33000.0000 | 537.11 MB |
The main remaining bottleneck now is the RWLock wrapping the LifetimeManager, in a profile 88% time is spent entering and exiting the lock. It's not that trivial to remove it, but I have an idea how to make this also lockfree (and also do away with the race which now may occur when deleting and incrementing metric at the same time). Would you be interested in that?
Thanks! There have also been some other optimization-related changes that cause some code conflicts here but I will try manually merge some of the good ideas from this PR.
One thing that has already been done is reducing the lifetime management from metric instance to metric template level, although I am wondering if it might be feasible to kick it up even one more level to the metric factory to reduce the overhead further.
Great! Moving the the lifetime timer higher up is definitely good news. At least for me it doesn't really matter if it's global or per-template metric. I don't have thousands of meters, just many different label values.
I rebased it on the current master branch, there was only a minor conflict. Do you have these changes in a private branch?
Yeah, I am currently working in the "optimizing" branch
| Method | MeasurementCount | Mean | Error | StdDev | Allocated |
|--------------- |----------------- |---------:|---------:|---------:|----------:|
| CounterInt | 100000 | 31.15 ms | 0.336 ms | 0.314 ms | 106 B |
| CounterFloat | 100000 | 31.03 ms | 0.283 ms | 0.265 ms | 106 B |
| HistogramInt | 100000 | 32.06 ms | 0.169 ms | 0.158 ms | 106 B |
| HistogramFloat | 100000 | 31.52 ms | 0.296 ms | 0.277 ms | 53 B |
In optimizing branch the story is starting to look decent now. Thanks for peeking under the covers here, you motivated some good improvement both with your PR and with other ideas that came from this!