mimir Changing active series matches from []bool to []int of indexes for better…

… memory performance

What this PR does

This PR is replacing the bool slice tracking matches for each matcher configured for each series reference with a []int tracking only matching matchers' indexes. The usage pattern of the custom trackers implies that in most usecases only a fraction of matchers are matching, thus this design saves on the stored values.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

[x] Tests updated
[x] Documentation added
[ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Jul 28 '22 08:07 gubjanos

Can we see benchmarks? I would expect accessing the map being way slower than accessing the slice.

Jul 28 '22 08:07 pracucci

I've run the existing benchmarks this way:

git checkout main
go test -run '^$' -bench '.*' -benchmem -count=3 ./pkg/ingester/activeseries > before.txt

hub pr checkout 2568
go test -run '^$' -bench '.*' -benchmem -count=3 ./pkg/ingester/activeseries > after.txt

benchstat before.txt after.txt

No significant difference:

name                                                   old time/op    new time/op    delta
ActiveSeriesTest_single_series/50-12                     51.1ns ± 2%    55.1ns ± 3%   ~     (p=0.100 n=3+3)
ActiveSeriesTest_single_series/100-12                    51.0ns ± 1%    55.6ns ± 0%   ~     (p=0.100 n=3+3)
ActiveSeriesTest_single_series/500-12                    55.4ns ± 5%    58.1ns ± 4%   ~     (p=0.400 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=100000-12      93.5ms ± 1%    93.5ms ± 2%   ~     (p=0.700 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=1000000-12      1.30s ± 4%     1.30s ± 5%   ~     (p=1.000 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=100000-12      431ms ± 2%     428ms ± 1%   ~     (p=1.000 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=1000000-12     5.35s ± 0%     5.36s ± 0%   ~     (p=0.700 n=3+3)
ActiveSeries_Active_once-12                               242µs ± 0%     277µs ± 3%   ~     (p=0.100 n=3+3)
ActiveSeries_Active_twice-12                              255µs ± 1%     287µs ± 0%   ~     (p=0.100 n=3+3)

name                                                   old alloc/op   new alloc/op   delta
ActiveSeriesTest_single_series/50-12                      0.00B          0.00B        ~     (all equal)
ActiveSeriesTest_single_series/100-12                     0.00B          0.00B        ~     (all equal)
ActiveSeriesTest_single_series/500-12                     0.00B          0.00B        ~     (all equal)
ActiveSeries_UpdateSeries/rounds=1_series=100000-12      19.6MB ± 0%    18.0MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=1000000-12      238MB ± 0%     222MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=100000-12     19.6MB ± 0%    18.0MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=1000000-12     238MB ± 0%     222MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_Active_once-12                                104B ± 3%      115B ± 1%   ~     (p=0.100 n=3+3)
ActiveSeries_Active_twice-12                               108B ± 1%      120B ± 1%   ~     (p=0.100 n=3+3)

name                                                   old allocs/op  new allocs/op  delta
ActiveSeriesTest_single_series/50-12                       0.00           0.00        ~     (all equal)
ActiveSeriesTest_single_series/100-12                      0.00           0.00        ~     (all equal)
ActiveSeriesTest_single_series/500-12                      0.00           0.00        ~     (all equal)
ActiveSeries_UpdateSeries/rounds=1_series=100000-12        208k ± 0%      208k ± 0%   ~     (p=0.200 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=1000000-12      2.02M ± 0%     2.02M ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=100000-12       208k ± 0%      208k ± 0%   ~     (p=0.200 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=1000000-12     2.02M ± 0%     2.02M ± 0%   ~     (p=0.400 n=3+3)
ActiveSeries_Active_once-12                                5.00 ± 0%      5.00 ± 0%   ~     (all equal)
ActiveSeries_Active_twice-12                               6.00 ± 0%      6.00 ± 0%   ~     (all equal)

Then I checked the benchmarks (I should have done it earlier) and I've realized no benchmark set the matchers, so obviously there's no difference. Please @gubjanos add benchmark(s) using matchers too.

Jul 28 '22 08:07 pracucci

I have created a benchmark for testing Matchers. The benchmark checks for different number of trackers and different number of labels on the serie. Please note that this setup is biased to my assumption, as only one of the trackers are matching. If we would increse the number of matching trackers, the memory footprint would increase. In this way I think it is better than implementing a bitset, but still only if the assumption of sparse matching holds. Here are my results:

name                                                   old time/op    new time/op    delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          150ns ± 3%     154ns ± 2%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         148ns ± 2%     152ns ± 1%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        152ns ± 1%     155ns ± 2%   ~     (p=0.400 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16        1.09µs ± 2%    1.05µs ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16       1.12µs ± 1%    1.05µs ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16      1.14µs ± 1%    1.05µs ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16       9.90µs ± 2%   10.14µs ± 3%   ~     (p=0.400 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16      9.82µs ± 1%   10.16µs ± 2%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16     9.74µs ± 1%   10.48µs ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16       164µs ± 1%     169µs ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16      165µs ± 1%     168µs ± 2%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16     163µs ± 1%     180µs ± 1%   ~     (p=0.100 n=3+3)

name                                                   old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          16.0B ± 0%      8.0B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         16.0B ± 0%      8.0B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        16.0B ± 0%      8.0B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16          112B ± 0%        8B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16         112B ± 0%        8B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16        112B ± 0%        8B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16       1.02kB ± 0%    0.01kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16      1.02kB ± 0%    0.01kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16     1.02kB ± 0%    0.01kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16      10.2kB ± 0%     0.0kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16     10.2kB ± 0%     0.0kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16    10.2kB ± 0%     0.0kB ± 0%   ~     (p=0.100 n=3+3)

name                                                   old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16           1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16          1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16         1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16          1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16         1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16        1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16         1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16        1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16        1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16      1.00 ± 0%      1.00 ± 0%   ~     (all equal)

Aug 01 '22 14:08 gubjanos

This result looks good to me, however, if we want to optimize here and we know that most of the series match just once, we can do better.

I made a quick PoC replacing your []int by a fixedSlice type that tries to use a fixed array before allocating a slice, like:

const fixedSliceSize = 4

type fixedSlice struct {
	arr  [fixedSliceSize]int
	arrl int
	rest []int
}

Ran your benchmarks, and it's faster:

$ benchstat BenchmarkMatchesSeries.slice.txt BenchmarkMatchesSeries.fixed.txt
name                                                   old time/op  new time/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-12        216ns ± 4%   174ns ± 4%  -19.09%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-12       215ns ± 1%   176ns ± 2%  -17.93%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-12      208ns ± 5%   167ns ± 1%  -19.63%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-12      1.38µs ± 4%  1.30µs ± 2%   -5.93%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-12     1.35µs ± 2%  1.29µs ± 2%   -4.60%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-12    1.34µs ± 1%  1.30µs ± 2%   -3.34%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-12     16.6µs ± 1%  15.2µs ± 2%   -8.19%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-12    16.4µs ± 1%  14.9µs ± 1%   -9.20%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-12   17.2µs ± 3%  16.4µs ± 1%   -4.46%  (p=0.016 n=5+4)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-12     238µs ±14%   232µs ± 4%     ~     (p=1.000 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-12    236µs ± 1%   213µs ± 6%   -9.62%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-12   241µs ± 3%   211µs ± 1%  -12.19%  (p=0.008 n=5+5)

And the best thing, it does zero allocations for your test case (each series matching just one matcher):

name                                                   old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-12          8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-12         8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-12        8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-12         8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-12        8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-12       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-12        8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-12       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-12      8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-12       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-12      8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-12     8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)

name                                                   old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-12           1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-12          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-12         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-12          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-12         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-12        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-12         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-12        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-12       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-12        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-12       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-12      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)

IMO, reducing one allocation per active series is a pretty big deal (that's 1.5M less stuff in the heap for GC to iterate on, in the worst case, because not all series match something), but I would like to hear @bboreham's opinion on this.

OTOH, I'm saying it's a quick PoC because I'd fine-tune a few things if we used this for prod:

I'd not use ints, but uint16 for matching tracker position, and limit the max number of trackers to 64k (let's be honest, if you have 64k trackers, you're not using the feature properly, and something will explode way before).
I'd reduce the size of fixedSliceSize to 3, and change arrl type to byte, so that three values plus len would pack into 8 bytes. This would mean that it's only an 8-byte overhead over the existing []int which is 24bytes plus a heap allocation.
I'd rename fixedSlice into something less terrible.

Aug 02 '22 08:08 colega

@gubjanos I think the last comment from @colega was very valid. Are you still interested in following up with the suggestions from Oleg?

Sep 28 '22 12:09 pracucci

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

Oct 07 '22 09:10 pracucci

@pracucci @colega Sorry for the delay. Yes, I am indeed interested in pushing this forward, but could we please first agree that this is something we would like to ship? @bboreham any objection?

Oct 18 '22 08:10 gubjanos

I think it's interesting to ship this improvements, then we can talk about how trackers can be further improved.

Oct 18 '22 09:10 colega

Sorry for the delay, changes lgtm with a few nitpicks and a missing changelog entry.

Can you also please post the latest benchmark comparison? Thank you.

Oct 20 '22 09:10 colega

Here is the comparison between your PoC version and the one with the improvements.

name                                                   old time/op    new time/op    delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          128ns ± 0%     126ns ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         129ns ± 2%     126ns ± 1%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        133ns ± 4%     129ns ± 2%   ~     (p=0.400 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16         949ns ± 2%     919ns ± 2%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16        930ns ± 1%     915ns ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16       921ns ± 1%     907ns ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16       8.55µs ± 1%    8.63µs ± 0%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16      8.82µs ± 4%    8.99µs ± 0%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16     8.68µs ± 2%    8.73µs ± 1%   ~     (p=1.000 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16       147µs ± 2%     148µs ± 2%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16      145µs ± 2%     157µs ± 2%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16     143µs ± 5%     154µs ± 3%   ~     (p=0.100 n=3+3)

name                                                   old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16         0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16        0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16       0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16        0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16       0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16      0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16       0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16      0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16     0.00B          0.00B        ~     (all equal)

name                                                   old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16           0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16          0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16         0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16          0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16         0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16        0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16         0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16        0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16       0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16        0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16       0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16      0.00           0.00        ~     (all equal)

Oct 24 '22 07:10 gubjanos

I've run a benchmark comparison between the current main version and the one that's being introduced here

Click here to see the benchmark I've ported to `main` to compare

func BenchmarkMatchesSeries(b *testing.B) {
	trackerCounts := []int{10, 100, 1000}
	asms := make([]*Matchers, len(trackerCounts))

	for i, matcherCount := range trackerCounts {
		configMap := map[string]string{}
		for j := 0; j < matcherCount; j++ {
			configMap[strconv.Itoa(j)] = fmt.Sprintf(`{this_will_match_%d="true"}`, j)
		}
		config, err := NewCustomTrackersConfig(configMap)
		require.NoError(b, err)
		asms[i] = NewMatchers(config)
	}

	makeLabels := func(total, matching int) labels.Labels {
		if total < matching {
			b.Fatal("wrong test setup, total < matching")
		}
		lbs := make(labels.Labels, 0, total)
		for i := 0; i < matching; i++ {
			lbs = append(lbs, labels.Label{Name: fmt.Sprintf(`this_will_match_%d`, i), Value: "true"})
		}
		for i := matching; i < total; i++ {
			lbs = append(lbs, labels.Label{Name: fmt.Sprintf("something_else_%d", i), Value: "true"})
		}
		return lbs
	}

	for i, trackerCount := range trackerCounts {
		for _, bc := range []struct {
			total, matching int
		}{
			{1, 0},
			{1, 1},
			{10, 1},
			{10, 2},
			{10, 5},
			{25, 1},
			{25, 2},
			{25, 5},
			{100, 1},
			{100, 2},
			{100, 5},
		} {
			series := makeLabels(bc.total, bc.matching)
			b.Run(fmt.Sprintf("TrackerCount: %d, Labels: %d, Matching: %d", trackerCount, bc.total, bc.matching), func(b *testing.B) {
				for x := 0; x < b.N; x++ {
					got := asms[i].Matches(series)
					matched := 0
					for _, v := range got {
						if v {
							matched++
						}
					}
					require.Equal(b, bc.matching, matched)
				}
			})
		}
	}
}

(I'll post a suggestion with some changes to your benchmark too in a separate comment, so it will match the one I ran here)

The results look great and kind of as the expected ones:

name                                                        old time/op    new time/op    delta
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_0         435ns ± 1%     428ns ± 1%    -1.69%  (p=0.016 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_1         461ns ± 1%     454ns ± 1%    -1.57%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_1        512ns ± 1%     509ns ± 1%      ~     (p=0.421 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_2        532ns ± 2%     531ns ± 2%      ~     (p=0.841 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_5        574ns ± 2%     625ns ± 3%    +9.02%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_1        777ns ± 1%     787ns ± 1%    +1.39%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_2        765ns ± 1%     789ns ± 3%    +3.15%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_5        723ns ± 2%     777ns ± 0%    +7.43%  (p=0.016 n=5+4)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_1      2.17µs ± 0%    2.29µs ± 2%    +5.35%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_2      2.00µs ± 0%    2.11µs ± 0%    +5.08%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_5      1.50µs ± 3%    1.60µs ± 1%    +6.44%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_0        885ns ± 1%     811ns ± 1%    -8.35%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_1        915ns ± 2%     846ns ± 0%    -7.53%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_1      1.43µs ± 1%    1.35µs ± 0%    -5.90%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_2      1.46µs ± 0%    1.38µs ± 0%    -5.33%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_5      1.52µs ± 0%    1.51µs ± 0%    -0.79%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_1      2.37µs ± 0%    2.39µs ± 2%      ~     (p=0.635 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_2      2.44µs ± 0%    2.53µs ± 3%    +3.73%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_5      2.51µs ± 0%    2.58µs ± 1%    +3.02%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_1     6.86µs ± 1%    6.94µs ± 0%      ~     (p=0.190 n=5+4)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_2     6.70µs ± 0%    6.77µs ± 0%    +1.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_5     6.23µs ± 1%    6.35µs ± 0%    +1.84%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_0      5.16µs ± 0%    4.62µs ± 1%   -10.48%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_1      5.31µs ± 0%    4.83µs ± 1%    -9.08%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_1     9.43µs ± 1%    8.83µs ± 0%    -6.27%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_2     9.72µs ± 0%    8.95µs ± 1%    -7.92%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_5     9.79µs ± 0%    9.22µs ± 3%    -5.79%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_1     16.4µs ± 1%    20.5µs ±12%   +25.12%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_2     17.2µs ± 3%    18.4µs ± 5%    +7.07%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_5     18.8µs ± 0%    20.0µs ± 4%    +6.34%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_1    52.7µs ± 2%    52.7µs ± 0%      ~     (p=0.421 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_2    53.0µs ± 1%    52.5µs ± 0%    -0.89%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_5    52.1µs ± 0%    52.2µs ± 0%      ~     (p=0.548 n=5+5)

name                                                        old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_0         16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_1         16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_1        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_2        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_5        16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_1        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_2        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_5        16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_1       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_2       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_5       16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_0         112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_1         112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_1        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_2        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_5        112B ± 0%        8B ± 0%   -92.86%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_1        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_2        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_5        112B ± 0%        8B ± 0%   -92.86%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_1       112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_2       112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_5       112B ± 0%        8B ± 0%   -92.86%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_0      1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_1      1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_1     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_2     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_5     1.02kB ± 0%    0.01kB ± 0%   -99.22%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_1     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_2     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_5     1.02kB ± 0%    0.01kB ± 0%   -99.22%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_1    1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_2    1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_5    1.02kB ± 0%    0.01kB ± 0%   -99.22%  (p=0.008 n=5+5)

name                                                        old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_0          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_1          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_1         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_2         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_5         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_1         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_2         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_5         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_5        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_0         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_1         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_5        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_5        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_1       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_2       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_5       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_0        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_1       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_2       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_5       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_1       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_2       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_5       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_1      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_2      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_5      1.00 ± 0%      1.00 ± 0%      ~     (all equal)

There's a similar performance (+/- 5% everywhere) with zero allocations except for 5 matching trackers.

Oct 24 '22 16:10 colega