cortex
cortex copied to clipboard
Merging slices from the labels APIs using more than 1 cpu and k-way
What this PR does: This PR changes the strategy we use to merge sorted slices containing the labels values/keys on the GetLabels and GetLabelValues APIs.
Before we were getting the response from each ingesters, adding in a map and sorting at the end to dedup/sort the response.
Now we are using Loser Tree for merge (same as used on https://github.com/prometheus/prometheus/pull/12878) and also using more than one core to merge those slices.
Benchmark from the GetLabelsValues APIs with different duplication factors (Ex: 0.67 duplication ratio means that the resulting slice will have 33% of the sum of the strings on the input slices):
goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/distributor
cpu: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
│ /tmp/old │ /tmp/new │
│ sec/op │ sec/op vs base │
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32 23.19m ± ∞ ¹ 13.27m ± ∞ ¹ -42.79% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32 6.349m ± ∞ ¹ 3.231m ± ∞ ¹ -49.12% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32 94.62m ± ∞ ¹ 50.59m ± ∞ ¹ -46.53% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32 23.08m ± ∞ ¹ 12.59m ± ∞ ¹ -45.46% (p=0.008 n=5)
geomean 23.81m 12.85m -46.02%
¹ need >= 6 samples for confidence interval at level 0.95
│ /tmp/old │ /tmp/new │
│ B/op │ B/op vs base │
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32 3.483Mi ± ∞ ¹ 11.323Mi ± ∞ ¹ +225.05% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32 446.0Ki ± ∞ ¹ 973.3Ki ± ∞ ¹ +118.24% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32 12.94Mi ± ∞ ¹ 35.34Mi ± ∞ ¹ +173.03% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32 994.9Ki ± ∞ ¹ 2227.3Ki ± ∞ ¹ +123.87% (p=0.008 n=5)
geomean 2.090Mi 5.363Mi +156.61%
¹ need >= 6 samples for confidence interval at level 0.95
│ /tmp/old │ /tmp/new │
│ allocs/op │ allocs/op vs base │
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32 2509.0 ± ∞ ¹ 907.0 ± ∞ ¹ -63.85% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32 739.0 ± ∞ ¹ 857.0 ± ∞ ¹ +15.97% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32 6.962k ± ∞ ¹ 2.660k ± ∞ ¹ -61.79% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32 2.264k ± ∞ ¹ 2.606k ± ∞ ¹ +15.11% (p=0.008 n=5)
geomean 2.325k 1.524k -34.47%
¹ need >= 6 samples for confidence interval at level 0.95
Benchmark with the merge implementation in isolation:
goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/util
cpu: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
BenchmarkMergeSlicesParallel/usingMap,inputSize:100,stringsPerInput:100,duplicateRatio:0.3-32 484 2308154 ns/op 791212 B/op 217 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:100,stringsPerInput:100,duplicateRatio:0.3-32 850 1313718 ns/op 372744 B/op 109 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:100,stringsPerInput:100,duplicateRatio:0.3-32 1147 973408 ns/op 868365 B/op 209 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:100,stringsPerInput:100,duplicateRatio:0.8-32 1327 830787 ns/op 212041 B/op 61 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:100,stringsPerInput:100,duplicateRatio:0.8-32 1024 1130711 ns/op 94216 B/op 106 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:100,stringsPerInput:100,duplicateRatio:0.8-32 1842 620375 ns/op 383621 B/op 200 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:100,stringsPerInput:100,duplicateRatio:0.95-32 2479 450711 ns/op 52476 B/op 21 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:100,stringsPerInput:100,duplicateRatio:0.95-32 1136 1020612 ns/op 45064 B/op 105 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:100,stringsPerInput:100,duplicateRatio:0.95-32 3314 341552 ns/op 134249 B/op 186 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:150,stringsPerInput:300,duplicateRatio:0.3-32 86 12017615 ns/op 3187257 B/op 1043 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:150,stringsPerInput:300,duplicateRatio:0.3-32 171 6776256 ns/op 2434920 B/op 161 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:150,stringsPerInput:300,duplicateRatio:0.3-32 238 4916036 ns/op 4608115 B/op 273 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:150,stringsPerInput:300,duplicateRatio:0.8-32 283 4055139 ns/op 831591 B/op 213 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:150,stringsPerInput:300,duplicateRatio:0.8-32 216 5447690 ns/op 354152 B/op 156 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:150,stringsPerInput:300,duplicateRatio:0.8-32 418 2764875 ns/op 1953886 B/op 258 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:150,stringsPerInput:300,duplicateRatio:0.95-32 543 2135988 ns/op 212043 B/op 61 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:150,stringsPerInput:300,duplicateRatio:0.95-32 241 4911898 ns/op 165736 B/op 155 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:150,stringsPerInput:300,duplicateRatio:0.95-32 835 1355215 ns/op 572329 B/op 239 allocs/op
Which issue(s) this PR fixes:
Fixes #
Checklist
- [X] Tests updated
- [ ] Documentation added
- [ ]
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]