Signficantly improve inhibitor performance via new cache datastructure
This PR improves or matches performance of the existing inhibitor in almost all benchmark cases. When each inhibition rule has many inhibiting alerts, performance is improved by several orders of magnitude. No change to the inhibitor interface is necessary.
│ main.txt │ icache-for-upstream-final.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.865µ ± 4% 1.597µ ± 7% -14.35% (p=0.002 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 2.007µ ± 7% 1.464µ ± 5% -27.04% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.941µ ± 7% 1.447µ ± 16% -25.46% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 1.884µ ± 3% 1.422µ ± 10% -24.55% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 1.879µ ± 13% 1.419µ ± 4% -24.49% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.973µ ± 8% 1.404µ ± 4% -28.86% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 3.726µ ± 10% 1.348µ ± 13% -63.84% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 21.485µ ± 5% 1.411µ ± 3% -93.43% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 184.245µ ± 9% 1.489µ ± 6% -99.19% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 20.453µ ± 14% 1.409µ ± 5% -93.11% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 2.226µ ± 8% 1.912µ ± 8% -14.13% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 6.959µ ± 2% 6.293µ ± 5% -9.57% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 55.08µ ± 4% 48.42µ ± 3% -12.10% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 534.1µ ± 7% 488.9µ ± 4% -8.45% (p=0.002 n=6)
geomean 8.267µ 3.181µ -61.52% 8.267µ 3.364µ -59.31%
The new implementation replaces the InhibitRule's scache with an icache (meaning "intersection cache") which pre-calculates the equal label values a target alert would need to be inhibited by each inhibitor alert. Practically, this is implemented by calculating a fingerprint for just the equalLabels of each inhibiting alert and storing that alert in a map keyed by the equalLabels fingerprint. This allows O(1) access to the set of inhibiting alerts for an InhibitRule in the Inhibitor.Mutes method. The new data structure is essentially just:
map[model.Fingerpint]map[model.Fingerprint]*types.Alert
// equaLabelsFp -> fp -> alert
To insert a new inhibitor alert "A":
- does "A" match the source labels? If not, exit
- make a
LabelSetof just the subset of A's labels which are in the InhibitRule'sequal_labels. Compute the fingerpint for this LabelSet and call it "E" - find the entry in the icache for E and insert A into it using A's fingerprint
To check if an alert "T" is inhibited:
- does T match the target labels? If not, exit. T is not inhibited
- make a
LabelSetof just the subset of T's labels which are in the InhibitRule'sequal_labels. Compute the fingerpint for this LabelSet and call it "E" - does an entry exist in the InhibitRule's icache for E? If not, exit. T is not inhibited
- Find the first non-resolved alert in the icache entry and return it. T is inhibited by this alert.
The new implementation has a few other minor improvements which help performance: it reduces the number of time an inhibitor alert's fingerprint is calculated by caching it in the icache, it pre-calculates whether an inhibitor matches the source and target matchers for an inhibit rules, it calls time.Now() less frequently (similar to #4119), and it avoids checking if a target alert matches the source matchers unless necessary.
In the real world, this change helps most when an inhibit rule covers many inhibitors and target alerts. For example, a rule where all critical alerts inhibit all warning alerts with the same instance and alertname can result in many possible inhibitors attached to a rule where each inhibitor alert only inhibits one target alert. The old implementation of the inhibitor is extremely inefficient in this case: assuming N inhibiting alerts and a fixed number of equalLabels, it performs O(N) string comparisons per target. The new implementation is effectively O(1) in this case (except in a pathological case where all alerts in the inhibitor are resolved, since the new implementation still has to iterate past resolved alerts).
The new implementation does require slightly more memory per inhibiting alert than the old one: we store the alert's fingerprint and a boolean to check if the inhibitor matches both the source and the target. This works out to be about 16 extra bytes per cached inhibitor alert after accounting for padding. For 10,000 alerts, that's about 160KB. I think this is a totally acceptable increase in memory requirements.
We've been running a very similar patch in our production environment which adds all inhibiting alerts to the marker for more than a month. In our environment this still results in a more then 4x reduction in time spent in the Inhibitor.Mutes method.
This change was motivated by observations which show a large amount of CPU spent doing string comparisons in the inhibitor. It seems like we're not the only ones who have been looking for this, and I think this might be a good alternative to #3933.
cc @grobinson-grafana or @gotjosh is there any interest in this PR? Or something I can do to make this easier to review?
I just updated the PR to include a new benchmark that's the same as 1_inhibition_rule,_X_inhibiting_alerts that uses the Equal field so that the target labelset is only inhibit by the last source alert. I think this isolates the O(1) behavior of the new implementation pretty well.
Here's a quick 1 iteration run on my machine:
$ benchstat 1-run-main.txt 1-run-patch.txt
goos: linux
goarch: amd64
pkg: github.com/prometheus/alertmanager/inhibit
cpu: AMD EPYC Processor (with IBPB)
│ 1-run-main.txt │ 1-run-patch.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.878µ ± ∞ ¹ 1.824µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 1.893µ ± ∞ ¹ 1.758µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.890µ ± ∞ ¹ 1.663µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 2.108µ ± ∞ ¹ 1.754µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 1.815µ ± ∞ ¹ 1.513µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.902µ ± ∞ ¹ 1.511µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 3.561µ ± ∞ ¹ 1.505µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 23.580µ ± ∞ ¹ 1.608µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 177.102µ ± ∞ ¹ 1.573µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_random_match-4 2.480µ ± ∞ ¹ 2.155µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_random_match-4 7.440µ ± ∞ ¹ 2.228µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_random_match-4 23.019µ ± ∞ ¹ 1.698µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_random_match-4 176.798µ ± ∞ ¹ 1.740µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 19.699µ ± ∞ ¹ 1.455µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10_inhibition_rules,_last_rule_matches-4 2.834µ ± ∞ ¹ 1.989µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/100_inhibition_rules,_last_rule_matches-4 13.936µ ± ∞ ¹ 7.869µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1000_inhibition_rules,_last_rule_matches-4 123.32µ ± ∞ ¹ 74.61µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10000_inhibition_rules,_last_rule_matches-4 1238.7µ ± ∞ ¹ 602.5µ ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 11.09µ 3.192µ -71.23%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
I'll also setup a version that just switches scache to a map[model.Fingerprint]*types.Alert and see how it does in comparison.
Ah, I typo-d the benchmark function name for 1_inhibition_rule,_1000_inhibiting_alerts_w/_random_match-4 and 1_inhibition_rule,_10000_inhibiting_alerts_w/_random_match-4 which explains the strange scaling there.
I just pushed a fixed version and things look more logical:
$ benchstat 1-run-remove-store.txt 1-run-patch.txt
goos: linux
goarch: amd64
pkg: github.com/prometheus/alertmanager/inhibit
cpu: AMD EPYC Processor (with IBPB)
│ 1-run-remove-store.txt │ 1-run-patch.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.753µ ± ∞ ¹ 1.515µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 1.677µ ± ∞ ¹ 1.476µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.727µ ± ∞ ¹ 1.425µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 1.596µ ± ∞ ¹ 1.425µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 1.671µ ± ∞ ¹ 1.435µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.882µ ± ∞ ¹ 1.665µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 1.881µ ± ∞ ¹ 1.349µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 2.012µ ± ∞ ¹ 1.404µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 2.146µ ± ∞ ¹ 1.534µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4 2.501µ ± ∞ ¹ 2.126µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4 6.389µ ± ∞ ¹ 2.046µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4 54.494µ ± ∞ ¹ 2.575µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4 637.438µ ± ∞ ¹ 1.746µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 1.696µ ± ∞ ¹ 1.497µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10_inhibition_rules,_last_rule_matches-4 2.862µ ± ∞ ¹ 2.153µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/100_inhibition_rules,_last_rule_matches-4 13.952µ ± ∞ ¹ 7.199µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/1000_inhibition_rules,_last_rule_matches-4 120.72µ ± ∞ ¹ 64.23µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Mutes/10000_inhibition_rules,_last_rule_matches-4 1191.2µ ± ∞ ¹ 608.0µ ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 6.855µ 3.062µ -55.34%
The baseline version has store.Alerts replaced with a plain map, so the performance improvement is not caused by saving allocations from scache.List().
Just for completeness, here's a full 6 iteration run which compares the implementation on main with the two local changes we discussed in the CNCF slack:
store.Alertsreplaced with a map- similar
excludeSourceAndTargetMatchlogic as in this PR (so that theSourceMatchers.Matchescall is lazy and theTargetMatchers.Matchescall is pre-computed)
$ benchstat full-bench-remove-store-updated.txt full-bench-patch.txt
goos: linux
goarch: amd64
pkg: github.com/prometheus/alertmanager/inhibit
cpu: AMD EPYC Processor (with IBPB)
│ full-bench-remove-store-updated.txt │ full-bench-patch.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.712µ ± 6% 1.410µ ± 31% ~ (p=0.065 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 1.792µ ± 5% 1.325µ ± 18% -26.04% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.707µ ± 20% 1.401µ ± 11% -17.96% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 1.756µ ± 30% 1.433µ ± 29% -18.42% (p=0.026 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 1.925µ ± 53% 1.987µ ± 52% ~ (p=0.699 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.800µ ± 10% 1.594µ ± 55% ~ (p=0.240 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 2.232µ ± 29% 1.456µ ± 16% -34.74% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 1.724µ ± 27% 1.397µ ± 8% -18.97% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 1.991µ ± 34% 1.419µ ± 29% -28.73% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4 2.418µ ± 8% 1.782µ ± 8% -26.31% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4 6.822µ ± 13% 1.755µ ± 20% -74.28% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4 53.252µ ± 7% 1.834µ ± 29% -96.56% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4 635.912µ ± 24% 1.860µ ± 22% -99.71% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 1.686µ ± 6% 1.388µ ± 5% -17.68% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 2.736µ ± 17% 1.986µ ± 8% -27.40% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 13.307µ ± 7% 8.368µ ± 13% -37.12% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 123.30µ ± 8% 60.93µ ± 8% -50.58% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 1162.8µ ± 14% 616.1µ ± 6% -47.02% (p=0.002 n=6)
geomean 6.882µ 2.966µ -56.90%
One last commit addressing benchmark performance for now - I found that the different in the _last_rule_matches performance between this PR and the souped-up main branch version is mostly calls to time.Now. The main implementation is calling time.Now much more frequently in that scenario.
I patched both that version and this PR to exit hasEqual and findInhibitor early if the InhibitRule cache is emtpy. Here's the new benchmark results for that set of patches:
│ full-bench-remove-store-updated_v2.txt │ full-bench-patch_v2.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.874µ ± 14% 1.565µ ± 30% -16.49% (p=0.041 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 1.679µ ± 6% 1.946µ ± 33% ~ (p=0.485 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.646µ ± 17% 1.436µ ± 32% -12.73% (p=0.037 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 1.708µ ± 8% 1.423µ ± 55% ~ (p=0.058 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 1.800µ ± 22% 1.456µ ± 23% -19.11% (p=0.015 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.799µ ± 9% 1.528µ ± 8% -15.09% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 1.718µ ± 16% 1.437µ ± 7% -16.36% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 1.747µ ± 8% 1.569µ ± 9% -10.19% (p=0.026 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 2.009µ ± 15% 1.572µ ± 14% -21.75% (p=0.004 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4 2.547µ ± 47% 1.886µ ± 19% -25.95% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4 7.097µ ± 9% 1.901µ ± 12% -73.22% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4 50.929µ ± 8% 1.931µ ± 15% -96.21% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4 580.522µ ± 8% 1.969µ ± 12% -99.66% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 1.777µ ± 25% 1.404µ ± 11% -21.02% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 2.113µ ± 36% 1.743µ ± 26% -17.49% (p=0.026 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 6.084µ ± 5% 4.237µ ± 17% -30.36% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 46.20µ ± 14% 30.51µ ± 6% -33.97% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 416.1µ ± 31% 305.5µ ± 11% -26.56% (p=0.002 n=6)
geomean 5.701µ 2.716µ -52.36%
The PR's implementation is still a bit faster in _last_rule_matches in this run. My profiler seems to think this is related to the time it takes to acquire the locks. I don't think we'd see any difference in the real world where there's contention for the lock.
Overall, this PR's implementation is faster than what's on main in basically all scenarios, and still orders of magnitude faster than the souped-up main algorithm in cases where each inhibit rule has a lot of source alerts.
And just for completeness, here's the comparison between the PR and a totally unpatched main branch:
│ full-bench-main.txt │ full-bench-patch_v2.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.811µ ± 14% 1.565µ ± 30% -13.61% (p=0.041 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 1.690µ ± 4% 1.946µ ± 33% ~ (p=0.589 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.770µ ± 28% 1.436µ ± 32% -18.85% (p=0.037 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 1.828µ ± 6% 1.423µ ± 55% ~ (p=0.065 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 1.848µ ± 55% 1.456µ ± 23% -21.19% (p=0.004 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.960µ ± 22% 1.528µ ± 8% -22.05% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 3.700µ ± 14% 1.437µ ± 7% -61.18% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 25.883µ ± 10% 1.569µ ± 9% -93.94% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 183.483µ ± 30% 1.572µ ± 14% -99.14% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4 2.526µ ± 8% 1.886µ ± 19% -25.34% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4 8.404µ ± 8% 1.901µ ± 12% -77.39% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4 72.286µ ± 9% 1.931µ ± 15% -97.33% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4 794.827µ ± 18% 1.969µ ± 12% -99.75% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 21.776µ ± 8% 1.404µ ± 11% -93.55% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 2.759µ ± 4% 1.743µ ± 26% -36.82% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 14.541µ ± 8% 4.237µ ± 17% -70.86% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 127.45µ ± 6% 30.51µ ± 6% -76.06% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 1258.8µ ± 9% 305.5µ ± 11% -75.73% (p=0.002 n=6)
geomean 12.97µ 2.716µ -79.06%
Inhibition performance is significantly improved in #4607
It uses an index instead of a cache.
See this comment on how the index works.
@Spaceman1701 can you rebase this implementation and check if the cache provides performance gains on top of the index and if it's still worth doing?
Edit: looking at the PR further, it seems to be a similar idea but implemented differently. So probably we can close this.
Hey @siavashs , at first glance it looks like the implementation of the index is very similar to the implementation of the icache in this change. E.g. makeCacheKey here and fingerprintEquals are pretty much the same. I'm skimming #4607 here, but I think the primary difference is that this change has one datastructure that is the scache and the index. Internally, our version of alertmanager uses this so that it can return the fingerprints of all inhibiting alerts. The other minor differences in this change:
- the matcher evaluations are cached for two sided matches
- calls to
time.Noware factored out of inner loops These make small improvements to performance. Given the similarities between #4607 and this change, it's probably easier to close this PR and implement those improvements ontop of what exists here. Unfortunately this isn't something I'll be able to do unless/if we're able to adopt the implementation that's been merged.
The one thing I don't understand is why the benchmarks in this PR seem better at a glance. It's possible this is down to the different machines we tested on.
Would you mind running the inhibitor benchmarks for this PR on your machine? If it still looks substantially different than the benchmarks for #4607, I can take a closer look to try to understand...
Here are the benchmarks I did for this branch (in case you want to compare with yours):
Details
~/D/a/alertmanager (icache-with-first-result|✔) $ go test -bench=. -run='^$' -count 7 -benchmem ./inhibit/ | tee benchmark-inhibit-cache.txt
goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/inhibit
cpu: Apple M3 Pro
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2549952 429.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2542627 430.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2352310 435.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2625428 434.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2573762 428.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2461554 426.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1_inhibiting_alert-12 2399557 428.8 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2341170 429.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2583420 427.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2442073 430.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2506743 426.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2377144 436.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2359090 441.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_1_inhibiting_alert-12 2466920 436.7 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2343828 438.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2412193 434.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2393886 434.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2749725 443.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2432582 436.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2458640 432.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1_inhibiting_alert-12 2364182 433.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 2381535 469.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 2245339 490.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 2288902 466.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 2076728 520.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 2091968 539.7 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 2007397 537.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_1_inhibiting_alert-12 1971786 527.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 2130751 590.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 1946528 596.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 1963693 571.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 2125946 569.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 1999339 578.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 2062027 574.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_1_inhibiting_alert-12 1971615 585.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2241342 476.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2341010 470.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2333052 468.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2325962 473.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2372875 465.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2374576 467.8 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts-12 2292949 468.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2446972 467.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2345157 462.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2564450 468.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2285212 464.7 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2402594 471.7 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2269624 470.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts-12 2383495 464.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2263526 473.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2403828 467.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2276610 471.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2391637 468.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2182570 471.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2562704 469.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts-12 2214241 473.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2218141 473.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2128590 474.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2177030 472.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2245520 469.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2256829 476.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2179735 471.8 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts-12 2163909 489.1 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1875188 599.7 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1787878 602.4 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1911996 602.1 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1819026 601.3 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1801441 599.9 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1797752 603.3 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 1826916 607.0 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1794582 606.3 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1865400 605.4 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1661011 613.3 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1829118 604.7 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1835413 602.5 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1772190 599.5 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 1820858 606.5 ns/op 488 B/op 10 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1757442 635.8 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1733418 612.2 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1824864 612.1 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1822245 609.4 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1847073 608.8 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1786066 609.8 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 1760792 603.3 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1750762 604.6 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1774530 607.7 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1750663 606.0 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1729974 613.4 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1734894 610.4 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1770200 608.6 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 1775438 606.9 ns/op 496 B/op 11 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2305063 448.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2237691 453.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2370349 454.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2259199 450.8 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2310182 463.6 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2264394 454.2 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_1000_inhibiting_alerts-12 2294893 454.3 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 1899793 556.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 1913986 553.9 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 1937922 554.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 1871246 562.0 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 1906359 553.7 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 1898174 552.4 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10_inhibition_rules,_last_rule_matches-12 2000439 559.5 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 720709 1540 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 697876 1554 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 713967 1610 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 685314 1549 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 718411 1553 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 692836 1556 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/100_inhibition_rules,_last_rule_matches-12 717962 1559 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 87838 12278 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 96835 12283 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 92163 12381 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 94401 13066 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 92577 13543 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 91092 12809 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/1000_inhibition_rules,_last_rule_matches-12 92542 14305 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 8779 116967 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 10286 121807 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 8880 123027 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 9526 126077 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 8800 126273 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 9118 134277 ns/op 432 B/op 8 allocs/op
BenchmarkMutes/10000_inhibition_rules,_last_rule_matches-12 8672 124294 ns/op 432 B/op 8 allocs/op
PASS
ok github.com/prometheus/alertmanager/inhibit 831.430s
And comparison with the current main:
goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/inhibit
cpu: Apple M3 Pro
│ benchmark-inhibit-main.txt │ benchmark-inhibit-cache.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-12 487.0n ± 1% 429.3n ± 1% -11.85% (p=0.001 n=7)
Mutes/10_inhibition_rules,_1_inhibiting_alert-12 494.6n ± 1% 430.0n ± 3% -13.06% (p=0.001 n=7)
Mutes/100_inhibition_rules,_1_inhibiting_alert-12 511.8n ± 3% 434.1n ± 2% -15.18% (p=0.001 n=7)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-12 586.2n ± 4% 520.5n ± 10% -11.21% (p=0.001 n=7)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-12 656.4n ± 1% 578.6n ± 3% -11.85% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-12 566.6n ± 2% 468.1n ± 2% -17.38% (p=0.001 n=7)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-12 567.2n ± 2% 467.3n ± 1% -17.61% (p=0.001 n=7)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-12 566.7n ± 13% 471.4n ± 1% -16.82% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-12 564.2n ± 7% 473.9n ± 3% -16.00% (p=0.001 n=7)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-12 544.9n ± 1% 454.2n ± 2% -16.65% (p=0.001 n=7)
Mutes/10_inhibition_rules,_last_rule_matches-12 1058.0n ± 2% 554.4n ± 1% -47.60% (p=0.001 n=7)
Mutes/100_inhibition_rules,_last_rule_matches-12 6.175µ ± 3% 1.554µ ± 4% -74.83% (p=0.001 n=7)
Mutes/1000_inhibition_rules,_last_rule_matches-12 60.00µ ± 4% 12.81µ ± 12% -78.65% (p=0.001 n=7)
Mutes/10000_inhibition_rules,_last_rule_matches-12 606.3µ ± 3% 124.3µ ± 8% -79.50% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 602.1n ± 1%
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 605.4n ± 1%
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 609.8n ± 4%
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 607.7n ± 1%
geomean 1.585µ 879.6n -38.28%
│ benchmark-inhibit-main.txt │ benchmark-inhibit-cache.txt │
│ B/op │ B/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/10_inhibition_rules,_1_inhibiting_alert-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/100_inhibition_rules,_1_inhibiting_alert-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-12 488.0 ± 0% 432.0 ± 0% -11.48% (p=0.001 n=7)
Mutes/10_inhibition_rules,_last_rule_matches-12 472.0 ± 0% 432.0 ± 0% -8.47% (p=0.001 n=7)
Mutes/100_inhibition_rules,_last_rule_matches-12 472.0 ± 0% 432.0 ± 0% -8.47% (p=0.001 n=7)
Mutes/1000_inhibition_rules,_last_rule_matches-12 472.0 ± 0% 432.0 ± 0% -8.47% (p=0.001 n=7)
Mutes/10000_inhibition_rules,_last_rule_matches-12 473.0 ± 0% 432.0 ± 0% -8.67% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 488.0 ± 0%
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 488.0 ± 0%
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 496.0 ± 0%
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 496.0 ± 0%
geomean 483.4 444.7 -10.64%
│ benchmark-inhibit-main.txt │ benchmark-inhibit-cache.txt │
│ allocs/op │ allocs/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/10_inhibition_rules,_1_inhibiting_alert-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/100_inhibition_rules,_1_inhibiting_alert-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/10_inhibition_rules,_last_rule_matches-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/100_inhibition_rules,_last_rule_matches-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1000_inhibition_rules,_last_rule_matches-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/10000_inhibition_rules,_last_rule_matches-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.001 n=7)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-12 10.00 ± 0%
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-12 10.00 ± 0%
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-12 11.00 ± 0%
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-12 11.00 ± 0%
geomean 10.00 8.496 -20.00%
There are performance improvements in some cases.
interesting... I think some of this is from calling time.Now less, but I'm not sure what the source of per inhibition rule performance is. Iirc, none of the benchmarks pass excludeTwoSidedMatch, so that can't explain it. I believe this PR does 1 fewer map lookups, but I highly doubt that's measurable at this timescale.
I'm not sure if I'll have time to investigate this in depth for a while, unfortunately.
Ok, I had a few minutes to kick off some benchmarks in the background. Interestingly, I'm seeing an even bigger per-inhibit rule performance improvement. I really don't know what's causing this...
│ main-with-index.txt │ icache-with-first-result.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 3.726µ ± 35% 1.381µ ± 19% -62.93% (p=0.002 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 3.359µ ± 16% 1.448µ ± 17% -56.89% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 3.714µ ± 31% 1.431µ ± 10% -61.48% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 3.696µ ± 22% 1.498µ ± 25% -59.47% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 8.774µ ± 154% 1.484µ ± 56% -83.09% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 3.954µ ± 20% 1.498µ ± 13% -62.12% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 3.607µ ± 7% 1.552µ ± 35% -56.97% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 3.716µ ± 13% 1.468µ ± 8% -60.50% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 3.418µ ± 13% 1.506µ ± 28% -55.94% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 3.669µ ± 25% 1.329µ ± 22% -63.77% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 16.239µ ± 6% 1.664µ ± 6% -89.76% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 147.144µ ± 18% 4.798µ ± 17% -96.74% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 1449.51µ ± 7% 31.36µ ± 17% -97.84% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 16880.0µ ± 43% 302.8µ ± 33% -98.21% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4 1.793µ ± 8%
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4 1.902µ ± 11%
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4 1.781µ ± 18%
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4 1.881µ ± 42%
geomean 15.76µ 2.635µ -81.47%
I'll investigate... Either way, I think we should just build on what we've merged to main rather than try to rebase this onto main.
So... I think that most of the performance difference is actually coming from With calls on Prometheus vectors 😮. My profiler says about 50% of CPU time is spent on this. I'm going to try to refactor a bit so that we already have the references to the right metrics inside the hot path.
That seems to have been it - here's the preliminary benchmark after moving metric vector selection outside the hot path:
│ main-with-index.txt │ main-with-index-fix-metrics.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 3.726µ ± 35% 2.736µ ± 40% -26.55% (p=0.041 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 3.359µ ± 16% 2.429µ ± 23% -27.69% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 3.714µ ± 31% 2.693µ ± 16% -27.48% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 3.696µ ± 22% 2.702µ ± 43% -26.91% (p=0.026 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 8.774µ ± 154% 4.248µ ± 26% -51.58% (p=0.015 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 3.954µ ± 20% 2.742µ ± 12% -30.66% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 3.607µ ± 7% 2.528µ ± 5% -29.91% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 3.716µ ± 13% 2.091µ ± 29% -43.74% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 3.418µ ± 13% 2.206µ ± 15% -35.46% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 3.669µ ± 25% 1.985µ ± 8% -45.89% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 16.239µ ± 6% 4.982µ ± 16% -69.32% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 147.14µ ± 18% 34.80µ ± 5% -76.35% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 1449.5µ ± 7% 335.6µ ± 5% -76.85% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 16.880m ± 43% 3.701m ± 8% -78.08% (p=0.002 n=6)
geomean 15.76µ 7.747µ -50.85%
And a bit more improvement after consolidating some time.Now() calls:
│ main-with-index.txt │ main-with-index-fix-metrics-and-time.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 3.726µ ± 35% 1.806µ ± 10% -51.52% (p=0.002 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 3.359µ ± 16% 1.894µ ± 25% -43.63% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 3.714µ ± 31% 1.922µ ± 13% -48.24% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 3.696µ ± 22% 2.071µ ± 20% -43.95% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 8.774µ ± 154% 2.856µ ± 26% -67.45% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 3.954µ ± 20% 1.877µ ± 43% -52.54% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 3.607µ ± 7% 2.134µ ± 18% -40.84% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 3.716µ ± 13% 1.912µ ± 36% -48.55% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 3.418µ ± 13% 2.047µ ± 16% -40.11% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 3.669µ ± 25% 1.860µ ± 6% -49.30% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 16.239µ ± 6% 4.400µ ± 10% -72.91% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 147.14µ ± 18% 27.75µ ± 6% -81.14% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 1449.5µ ± 7% 278.1µ ± 7% -80.81% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 16.880m ± 43% 2.888m ± 10% -82.89% (p=0.002 n=6)
geomean 15.76µ 6.152µ -60.98%
It might improve things a little further if we can cache the excludeTwoSidedMatch, but I don't think much more improvement is possible without removing some of the new metrics.
For comparison, this is where we are compared to this PR:
│ main-with-index-fix-metrics-and-time.txt │ icache-with-first-result.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4 1.806µ ± 10% 1.381µ ± 19% -23.53% (p=0.009 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4 1.894µ ± 25% 1.448µ ± 17% -23.53% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4 1.922µ ± 13% 1.431µ ± 10% -25.57% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4 2.071µ ± 20% 1.498µ ± 25% -27.69% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4 2.856µ ± 26% 1.484µ ± 56% -48.05% (p=0.004 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4 1.877µ ± 43% 1.498µ ± 13% -20.20% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4 2.134µ ± 18% 1.552µ ± 35% -27.27% (p=0.009 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4 1.912µ ± 36% 1.468µ ± 8% -23.23% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4 2.047µ ± 16% 1.506µ ± 28% -26.43% (p=0.015 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4 1.860µ ± 6% 1.329µ ± 22% -28.55% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4 4.400µ ± 10% 1.664µ ± 6% -62.19% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4 27.751µ ± 6% 4.798µ ± 17% -82.71% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4 278.10µ ± 7% 31.36µ ± 17% -88.72% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4 2888.3µ ± 10% 302.8µ ± 33% -89.52% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4 1.793µ ± 8%
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4 1.902µ ± 11%
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4 1.781µ ± 18%
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4 1.881µ ± 42%
geomean 6.152µ 2.635µ -52.52%
I think I might open a PR with the changes I have. I can iterate a little more there.
Close in favor of #4668