cortex Merging slices from the labels APIs using more than 1 cpu and k-way

Merging slices from the labels APIs using more than 1 cpu and k-way

Open alanprot opened this issue 1 year ago • 0 comments

What this PR does: This PR changes the strategy we use to merge sorted slices containing the labels values/keys on the GetLabels and GetLabelValues APIs.

Before we were getting the response from each ingesters, adding in a map and sorting at the end to dedup/sort the response.

Now we are using Loser Tree for merge (same as used on https://github.com/prometheus/prometheus/pull/12878) and also using more than one core to merge those slices.

Benchmark from the GetLabelsValues APIs with different duplication factors (Ex: 0.67 duplication ratio means that the resulting slice will have 33% of the sum of the strings on the input slices):

goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/distributor
cpu: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
                                                                                                    │   /tmp/old   │              /tmp/new               │
                                                                                                    │    sec/op    │    sec/op     vs base               │
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32   23.19m ± ∞ ¹   13.27m ± ∞ ¹  -42.79% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32   6.349m ± ∞ ¹   3.231m ± ∞ ¹  -49.12% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32   94.62m ± ∞ ¹   50.59m ± ∞ ¹  -46.53% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32   23.08m ± ∞ ¹   12.59m ± ∞ ¹  -45.46% (p=0.008 n=5)
geomean                                                                                               23.81m         12.85m        -46.02%
¹ need >= 6 samples for confidence interval at level 0.95

                                                                                                    │   /tmp/old    │                /tmp/new                │
                                                                                                    │     B/op      │      B/op       vs base                │
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32   3.483Mi ± ∞ ¹   11.323Mi ± ∞ ¹  +225.05% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32   446.0Ki ± ∞ ¹    973.3Ki ± ∞ ¹  +118.24% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32   12.94Mi ± ∞ ¹    35.34Mi ± ∞ ¹  +173.03% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32   994.9Ki ± ∞ ¹   2227.3Ki ± ∞ ¹  +123.87% (p=0.008 n=5)
geomean                                                                                               2.090Mi          5.363Mi        +156.61%
¹ need >= 6 samples for confidence interval at level 0.95

                                                                                                    │   /tmp/old   │              /tmp/new               │
                                                                                                    │  allocs/op   │  allocs/op    vs base               │
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32   2509.0 ± ∞ ¹    907.0 ± ∞ ¹  -63.85% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters150,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32    739.0 ± ∞ ¹    857.0 ± ∞ ¹  +15.97% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.67-32   6.962k ± ∞ ¹   2.660k ± ∞ ¹  -61.79% (p=0.008 n=5)
Distributor_GetLabelsValues/numIngesters500,lblValuesPerIngester1000,lblValuesDuplicateRatio0.98-32   2.264k ± ∞ ¹   2.606k ± ∞ ¹  +15.11% (p=0.008 n=5)
geomean                                                                                               2.325k         1.524k        -34.47%
¹ need >= 6 samples for confidence interval at level 0.95

Benchmark with the merge implementation in isolation:

goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/util
cpu: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
BenchmarkMergeSlicesParallel/usingMap,inputSize:100,stringsPerInput:100,duplicateRatio:0.3-32         	     484	   2308154 ns/op	  791212 B/op	     217 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:100,stringsPerInput:100,duplicateRatio:0.3-32    	     850	   1313718 ns/op	  372744 B/op	     109 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:100,stringsPerInput:100,duplicateRatio:0.3-32    	    1147	    973408 ns/op	  868365 B/op	     209 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:100,stringsPerInput:100,duplicateRatio:0.8-32         	    1327	    830787 ns/op	  212041 B/op	      61 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:100,stringsPerInput:100,duplicateRatio:0.8-32    	    1024	   1130711 ns/op	   94216 B/op	     106 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:100,stringsPerInput:100,duplicateRatio:0.8-32    	    1842	    620375 ns/op	  383621 B/op	     200 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:100,stringsPerInput:100,duplicateRatio:0.95-32        	    2479	    450711 ns/op	   52476 B/op	      21 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:100,stringsPerInput:100,duplicateRatio:0.95-32   	    1136	   1020612 ns/op	   45064 B/op	     105 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:100,stringsPerInput:100,duplicateRatio:0.95-32   	    3314	    341552 ns/op	  134249 B/op	     186 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:150,stringsPerInput:300,duplicateRatio:0.3-32         	      86	  12017615 ns/op	 3187257 B/op	    1043 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:150,stringsPerInput:300,duplicateRatio:0.3-32    	     171	   6776256 ns/op	 2434920 B/op	     161 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:150,stringsPerInput:300,duplicateRatio:0.3-32    	     238	   4916036 ns/op	 4608115 B/op	     273 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:150,stringsPerInput:300,duplicateRatio:0.8-32         	     283	   4055139 ns/op	  831591 B/op	     213 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:150,stringsPerInput:300,duplicateRatio:0.8-32    	     216	   5447690 ns/op	  354152 B/op	     156 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:150,stringsPerInput:300,duplicateRatio:0.8-32    	     418	   2764875 ns/op	 1953886 B/op	     258 allocs/op
BenchmarkMergeSlicesParallel/usingMap,inputSize:150,stringsPerInput:300,duplicateRatio:0.95-32        	     543	   2135988 ns/op	  212043 B/op	      61 allocs/op
BenchmarkMergeSlicesParallel/parallelism:1,inputSize:150,stringsPerInput:300,duplicateRatio:0.95-32   	     241	   4911898 ns/op	  165736 B/op	     155 allocs/op
BenchmarkMergeSlicesParallel/parallelism:8,inputSize:150,stringsPerInput:300,duplicateRatio:0.95-32   	     835	   1355215 ns/op	  572329 B/op	     239 allocs/op

Which issue(s) this PR fixes: Fixes #

Checklist

[X] Tests updated
[ ] Documentation added
[ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Feb 22 '24 22:02 alanprot

cortex cortex copied to clipboard

Merging slices from the labels APIs using more than 1 cpu and k-way

cortex
cortex copied to clipboard