Add Inhibitor.MutesAll()
For the context please refer to #3932.
I added benchmarks, which consider more than one target alert. The following benchstat output compares:
- invoking
Inhibitor.Mutes()for each target alert (old.txt) - with the
Inhibitor.MutesAll()implementation (new.txt).Inhibitor.MutesAll()increase the speed in certain cases at the cost of some memory.
goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/inhibit
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Mutes/1_inhibition_rule,_1_inhibiting_alert,_1_target_alert-10 768.5n ± 3% 842.5n ± 5% +9.62% (p=0.001 n=10)
Mutes/10_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10 769.5n ± 2% 834.4n ± 0% +8.44% (p=0.000 n=10)
Mutes/100_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10 779.3n ± 1% 841.4n ± 1% +7.96% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10 839.5n ± 1% 914.7n ± 4% +8.96% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_100_inhibiting_alert,_100_target_alert-10 157.39µ ± 1% 54.41µ ± 1% -65.43% (p=0.000 n=10)
Mutes/10000_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10 897.5n ± 3% 962.6n ± 2% +7.25% (p=0.000 n=10)
Mutes/10000_inhibition_rules,_100_inhibiting_alert,_100_target_alert-10 63.07µ ± 6% 57.59µ ± 3% -8.68% (p=0.000 n=10)
Mutes/1_inhibition_rule,_10_inhibiting_alerts,_1_target_alert-10 886.9n ± 1% 1087.0n ± 1% +22.57% (p=0.000 n=10)
Mutes/1_inhibition_rule,_100_inhibiting_alerts,_1_target_alert-10 1.929µ ± 2% 3.083µ ± 1% +59.84% (p=0.000 n=10)
Mutes/1_inhibition_rule,_100_inhibiting_alerts,_100_target_alert-10 174.18µ ± 1% 60.44µ ± 1% -65.30% (p=0.000 n=10)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts,_1_target_alert-10 12.67µ ± 1% 25.82µ ± 1% +103.87% (p=0.000 n=10)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts,_1000_target_alert-10 12425.7µ ± 1% 582.3µ ± 1% -95.31% (p=0.000 n=10)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts,_1_target_alert-10 104.4µ ± 1% 278.8µ ± 1% +167.13% (p=0.000 n=10)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts,_1_target_alert-10 11.89µ ± 2% 24.69µ ± 0% +107.65% (p=0.000 n=10)
Mutes/10_inhibition_rules,_last_rule_matches,_1_target_alert-10 1.012µ ± 3% 1.125µ ± 1% +11.22% (p=0.000 n=10)
Mutes/100_inhibition_rules,_last_rule_matches,_1_target_alert-10 3.178µ ± 1% 3.872µ ± 1% +21.86% (p=0.000 n=10)
Mutes/100_inhibition_rules,_last_rule_matches,_100_target_alerts-10 299.9µ ± 1% 227.9µ ± 1% -24.00% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_last_rule_matches,_1_target_alert-10 24.62µ ± 3% 31.57µ ± 1% +28.24% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_last_rule_matches,_1000_target_alert-10 24.29m ± 0% 18.27m ± 0% -24.77% (p=0.000 n=10)
Mutes/10000_inhibition_rules,_last_rule_matches,_1_target_alert-10 239.6µ ± 6% 306.5µ ± 0% +27.94% (p=0.000 n=10)
geomean 18.71µ 17.34µ -7.36%
In cases where a low count of alerts is involved Inhibitor.MutesAll() is 2-3 times slower, due to the additional memory allocations.
This should still be sufficiently fast given the small scale.
When the count of target alerts is increased Inhibitor.MutesAll() becomes faster.
Invoking it as part of GET /alerts still needs to be done.
Do you intend to call it from MuteStage as well as APIv2?
Yes.
The PR seems incomplete without it?
I thought I check the general interest for the topic before implementing everything. I will add it. Might take a few days.
I added MutesAll to the api and MuteStage code paths. The CI needs a re-run, network foo during setup.
Is it possible to optimize inhibition rules without changing the Mutes interface?
I implemented that. In the current state the algorithmic improvement can only be realised when all alerts are passed to MutesAll(). By moving setAlertStatus out of alertFilter the different API handlers can choose how to update it. For getAlertsHandler I used MutesAll() here.
Is it possible to optimize inhibition rules without changing the Mutes interface?
I implemented that. In the current state the algorithmic improvement can only be realised when all alerts are passed to
MutesAll().
I looked again at the code, but I'm afraid I don't see where the optimization is implemented for Mutes()? Like you said, the performance optimization only works if you pass all alerts to MutesAll(), and there is no performance improvement if you only call Mutes()? Did I misunderstand?
My suggestion was to see if we can optimize the performance of Mutes without having to pass all alerts to MutesAll(). For example, what if the cache is persistent, like matcherCache for silences?
there is no performance improvement if you only call
Mutes()?
Exactly. The whole speed improvement comes from caching the source cache evaluation of InhibitRules across multiple alerts. In the context of inhibitions, that's the whole set of potential target alerts. For MutesAll() it roughly boils down to:
for _, r := range ih.rules {
var alerts []*types.Alert // <- reused in the inner loop
var scacheEval []bool // <- reused in the inner loop
for i, lset := range lsets {
// ...
if inhibitedByFP, eq := r.hasEqualCached(lset, r.SourceMatchers.Matches(lset), alerts, scacheEval); eq {
ih.marker.SetInhibited(fingerprints[i], inhibitedByFP.String())
}
// ...
}
}
Just using Mutes() flips the two loops. That looks roughly like:
func someOtherFunc() {
// ...
for _, a := range alerts {
inhibitor.Mutes(a.Labels) // <- evaluates the source cache for each a
}
}
func (ih *Inhibitor) Mutes(lset model.LabelSet) bool {
// ...
for _, r := range ih.rules {
// ...
if inhibitedByFP, eq := r.hasEqual(lset, r.SourceMatchers.Matches(lset)); eq { // <- always evaluates the source cache matches
ih.marker.SetInhibited(fp, inhibitedByFP.String())
return true
}
}
}
Caching the source cache evaluations "requires" to pass a list of alerts. If the signature change/addition is undesirable, we can close this PR. It doesn't work without. Technically, the state MutesAll() caches can be externalized into a struct, which implements the Muter interface, e.g.:
type CachingInhibitor {
base Inhibitor
var alerts [][]*types.Alert // one cache for each inhibition rule
var scacheEval [][]bool // one cache for each inhibition rule
}
Externalizing the state would cost more memory, likely requires locking and requires explicitly dropping these caches at some point (likely after calling Mutes() a few times).
My suggestion was to see if we can optimize the performance of Mutes without having to pass all alerts to MutesAll(). For example, what if the cache is persistent, like matcherCache for silences?
I expect something in that direction is possible as well, but it's something for different PR.
Hi! I know it's been a while, is this change something you're still interested in working on?
Inhibitor performance has also been improved a lot, so I'm not sure what the performance gains of this PR are anymore. If you still think it'd be useful (and have the time), the first step is probably to rebase and collect new benchmarks.
Hey, thanks for the information. We tried the latest release, but haven't seen a significant improvement unfortunately.
If you still think it'd be useful (and have the time), the first step is probably to rebase and collect new benchmarks.
I'll have to see, if I get around to it. 😢
Hey, thanks for the information. We tried the latest release, but haven't seen a significant improvement unfortunately.
Ah actually, these improvements aren't in a release quite yet. They should be included in v0.30 (which we're planning to cut in a few weeks).