prometheus-net icon indicating copy to clipboard operation
prometheus-net copied to clipboard

Improve performance of MeterAdapter

Open exyi opened this issue 2 years ago • 5 comments

(sorry for an empty email message, I accidentally pressed enter in the title)

The MeterAdapter is quite slow compared to using prometheus-net directly, I noticed that especially in a multi threaded scenario - it isn't that hard to hammer one metric in parallel on a many core machine. This PR adds benchmarks to demonstrate the improvement, and improves performance both in the single-threaded and the multi-threaded case.

Main changes are

  • _options.ResolveHistogramBuckets is not called on each observation. It is not cached together with the rest of metric initialization code.
  • The locks are removed from LifetimeManager

Other details should be described in the commit messages or comments.

Before

Method MeasurementCount ThreadCount TargetMetricType Mean Error StdDev Gen0 Completed Work Items Lock Contentions Gen1 Allocated
MeasurementPerformance 200000 1 CounterInt 101.2 ms 0.40 ms 0.80 ms 9000.0000 - - - 158.69 MB
MeasurementPerformance 200000 1 CounterFloat 102.8 ms 0.69 ms 1.35 ms 3000.0000 - - - 158.69 MB
MeasurementPerformance 200000 1 HistogramInt 154.2 ms 1.36 ms 2.71 ms 4000.0000 - - - 196.84 MB
MeasurementPerformance 200000 1 HistogramFloat 151.8 ms 1.15 ms 2.27 ms 4000.0000 - - - 196.84 MB
MeasurementPerformance 200000 16 CounterInt 1,438.3 ms 39.26 ms 165.81 ms 159000.0000 - 89825.0000 1000.0000 2539.07 MB
MeasurementPerformance 200000 16 CounterFloat 1,474.8 ms 29.78 ms 124.11 ms 53000.0000 - 56599.0000 - 2539.07 MB
MeasurementPerformance 200000 16 HistogramInt 1,769.7 ms 35.19 ms 115.24 ms 66000.0000 - 74652.0000 - 3149.42 MB
MeasurementPerformance 200000 16 HistogramFloat 1,725.7 ms 33.78 ms 112.99 ms 66000.0000 - 62252.0000 - 3149.42 MB

After

Method MeasurementCount ThreadCount TargetMetricType Mean Error StdDev Completed Work Items Lock Contentions Gen0 Allocated
MeasurementPerformance 200000 1 CounterInt 37.89 ms 0.041 ms 0.080 ms - - - 27.47 MB
MeasurementPerformance 200000 1 CounterFloat 38.83 ms 0.267 ms 0.520 ms - - 1000.0000 27.47 MB
MeasurementPerformance 200000 1 HistogramInt 48.21 ms 0.178 ms 0.347 ms - - 2000.0000 33.57 MB
MeasurementPerformance 200000 1 HistogramFloat 47.69 ms 0.125 ms 0.246 ms - - - 33.57 MB
MeasurementPerformance 200000 16 CounterInt 881.51 ms 29.158 ms 123.458 ms - 13.0000 27000.0000 439.46 MB
MeasurementPerformance 200000 16 CounterFloat 906.75 ms 27.166 ms 115.024 ms - 10.0000 9000.0000 439.46 MB
MeasurementPerformance 200000 16 HistogramInt 928.74 ms 30.498 ms 129.131 ms - 16.0000 33000.0000 537.11 MB
MeasurementPerformance 200000 16 HistogramFloat 851.88 ms 30.896 ms 130.815 ms - 15.0000 33000.0000 537.11 MB

exyi avatar Oct 20 '23 20:10 exyi

The main remaining bottleneck now is the RWLock wrapping the LifetimeManager, in a profile 88% time is spent entering and exiting the lock. It's not that trivial to remove it, but I have an idea how to make this also lockfree (and also do away with the race which now may occur when deleting and incrementing metric at the same time). Would you be interested in that?

exyi avatar Oct 20 '23 21:10 exyi

Thanks! There have also been some other optimization-related changes that cause some code conflicts here but I will try manually merge some of the good ideas from this PR.

One thing that has already been done is reducing the lifetime management from metric instance to metric template level, although I am wondering if it might be feasible to kick it up even one more level to the metric factory to reduce the overhead further.

sandersaares avatar Nov 29 '23 12:11 sandersaares

Great! Moving the the lifetime timer higher up is definitely good news. At least for me it doesn't really matter if it's global or per-template metric. I don't have thousands of meters, just many different label values.

I rebased it on the current master branch, there was only a minor conflict. Do you have these changes in a private branch?

exyi avatar Nov 29 '23 14:11 exyi

Yeah, I am currently working in the "optimizing" branch

sandersaares avatar Nov 30 '23 07:11 sandersaares

| Method         | MeasurementCount | Mean     | Error    | StdDev   | Allocated |
|--------------- |----------------- |---------:|---------:|---------:|----------:|
| CounterInt     | 100000           | 31.15 ms | 0.336 ms | 0.314 ms |     106 B |
| CounterFloat   | 100000           | 31.03 ms | 0.283 ms | 0.265 ms |     106 B |
| HistogramInt   | 100000           | 32.06 ms | 0.169 ms | 0.158 ms |     106 B |
| HistogramFloat | 100000           | 31.52 ms | 0.296 ms | 0.277 ms |      53 B |

In optimizing branch the story is starting to look decent now. Thanks for peeking under the covers here, you motivated some good improvement both with your PR and with other ideas that came from this!

sandersaares avatar Nov 30 '23 21:11 sandersaares