GH-41035: [C++] Add a grouper benchmark for preventing performance regression
Rationale for this change
Add a grouper benchmark for preventing performance regression .
https://github.com/apache/arrow/pull/40998#issuecomment-2039204161.
What changes are included in this PR?
Added a benchmark.
Are these changes tested?
Needn't.
Are there any user-facing changes?
No
- GitHub Issue: #41035
:warning: GitHub issue #41035 has been automatically assigned in GitHub to PR creator.
@ursabot please benchmark
Benchmark runs are scheduled for commit 620fc87373a76e1805167cb2ed81bb482f10eb1a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.
@pitrou @westonpace PTAL?
Thanks for your patience. Conbench analyzed the 7 benchmarking runs that have been run so far on PR commit 620fc87373a76e1805167cb2ed81bb482f10eb1a.
There was 1 benchmark result with an error:
- Pull Request Run on
ursa-i9-9960xat 2024-04-05 17:16:43Z
There were 4 benchmark results indicating a performance regression:
- Pull Request Run on
ursa-thinkcentre-m75qat 2024-04-05 18:32:36ZArrayArrayKernel(C++) with params=<SubtractChecked, UInt64Type>/size:524288/inverse_null_proportion:0, source=cpp-micro, suite=arrow-compute-scalar-arithmetic-benchmarkArrayArrayKernel(C++) with params=<SubtractChecked, Int64Type>/size:524288/inverse_null_proportion:0, source=cpp-micro, suite=arrow-compute-scalar-arithmetic-benchmark
- and 2 more (see the report linked below)
The full Conbench report has more details.
@ursabot please benchmark
Benchmark runs are scheduled for commit 001d4c30a2d7d886fa5b452803e8be84efb3f82e. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 001d4c30a2d7d886fa5b452803e8be84efb3f82e.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details.
I would really prefer items/s over bytes/s as bytes/s is highly misleading. In the end, we are most of the time interested in performance over logical items, not in how large the data representation is.
(that said, it's not forbidden to add both)
One minor nit is that I prefer bytes processed over items processed (it is more comparable and I don't have to worry about knowing as many details of the benchmark) but I won't force that.
As pitrou said, bytes is actually not accurate in this benchmark because we have a varlength type, which may make the bytes/s indicator very high, but it is not as intuitive as items/s.
The results are like below:
2024-04-11T09:04:45+08:00
Running ./arrow-compute-grouper-benchmark
Run on (16 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 512 KiB (x8)
L3 Unified 16384 KiB (x1)
Load Average: 1.18, 1.64, 1.97
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
---------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------------------
GrouperWithMultiTypes/"{boolean}"/1024/10000 1329 us 1329 us 526 items_per_second=770.333k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/100 1508 us 1507 us 465 items_per_second=679.37k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/10 1534 us 1533 us 457 items_per_second=667.84k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/2 1616 us 1616 us 432 items_per_second=633.552k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/1 1720 us 1720 us 405 items_per_second=595.489k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/0 1333 us 1333 us 527 items_per_second=768.172k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{boolean}"/4096/10000 5127 us 5127 us 137 items_per_second=798.951k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/100 5819 us 5818 us 120 items_per_second=703.99k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/10 5877 us 5876 us 118 items_per_second=697.06k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/2 6308 us 6307 us 112 items_per_second=649.449k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/1 6641 us 6640 us 105 items_per_second=616.851k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/0 5121 us 5120 us 136 items_per_second=800.072k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32}"/1024/10000 1368 us 1368 us 513 items_per_second=748.798k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/100 1620 us 1620 us 432 items_per_second=632.098k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/10 1597 us 1597 us 439 items_per_second=641.304k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/2 1625 us 1624 us 431 items_per_second=630.423k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/1 1637 us 1636 us 428 items_per_second=625.837k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/0 1379 us 1378 us 508 items_per_second=743.075k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32}"/4096/10000 5402 us 5401 us 130 items_per_second=758.434k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/100 6196 us 6195 us 113 items_per_second=661.201k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/10 6165 us 6164 us 113 items_per_second=664.549k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/2 6300 us 6299 us 111 items_per_second=650.222k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/1 6287 us 6286 us 111 items_per_second=651.6k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/0 5412 us 5411 us 129 items_per_second=757.029k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int64}"/1024/10000 1347 us 1347 us 518 items_per_second=760.192k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/100 1569 us 1568 us 447 items_per_second=652.955k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/10 1577 us 1576 us 444 items_per_second=649.554k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/2 1604 us 1604 us 435 items_per_second=638.412k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/1 1607 us 1607 us 435 items_per_second=637.123k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/0 1374 us 1374 us 522 items_per_second=745.246k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int64}"/4096/10000 5371 us 5370 us 129 items_per_second=762.703k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/100 6181 us 6180 us 113 items_per_second=662.826k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/10 6157 us 6156 us 113 items_per_second=665.384k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/2 6234 us 6232 us 112 items_per_second=657.237k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/1 6251 us 6250 us 113 items_per_second=655.382k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/0 5371 us 5370 us 130 items_per_second=762.709k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{utf8}"/1024/10000 2497 us 2496 us 281 items_per_second=410.19k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/100 2694 us 2694 us 260 items_per_second=380.111k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/10 2617 us 2616 us 268 items_per_second=391.371k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/2 2551 us 2551 us 274 items_per_second=401.434k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/1 3188 us 3187 us 220 items_per_second=321.258k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/0 2501 us 2500 us 273 items_per_second=409.55k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{utf8}"/4096/10000 10590 us 10588 us 66 items_per_second=386.85k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/100 10617 us 10615 us 66 items_per_second=385.858k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/10 10379 us 10377 us 67 items_per_second=394.717k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/2 10074 us 10072 us 70 items_per_second=406.684k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/1 12487 us 12485 us 56 items_per_second=328.078k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/0 9820 us 9817 us 71 items_per_second=417.214k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/10000 3061 us 3060 us 228 items_per_second=334.659k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/100 3316 us 3315 us 209 items_per_second=308.917k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/10 3340 us 3339 us 211 items_per_second=306.641k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/2 3242 us 3241 us 217 items_per_second=315.953k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/1 3157 us 3156 us 221 items_per_second=324.446k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/0 3114 us 3114 us 226 items_per_second=328.872k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/10000 12839 us 12836 us 55 items_per_second=319.102k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/100 13759 us 13755 us 51 items_per_second=297.774k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/10 13416 us 13413 us 53 items_per_second=305.371k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/2 13103 us 13101 us 54 items_per_second=312.646k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/1 12628 us 12626 us 55 items_per_second=324.398k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/0 12778 us 12775 us 55 items_per_second=320.619k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/10000 3102 us 3102 us 226 items_per_second=330.162k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/100 13830 us 13827 us 51 items_per_second=74.0558k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/10 394675 us 394582 us 2 items_per_second=2.59515k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/2 1409485 us 1409178 us 1 items_per_second=726.665/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/1 4343 us 4343 us 162 items_per_second=235.808k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/0 3111 us 3111 us 226 items_per_second=329.148k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/10000 13898 us 13894 us 51 items_per_second=294.793k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/100 54716 us 54700 us 12 items_per_second=74.8806k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/10 941758 us 941554 us 1 items_per_second=4.35026k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/2 12871536 us 12868926 us 1 items_per_second=318.286/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/1 16971 us 16968 us 41 items_per_second=241.392k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/0 12383 us 12381 us 56 items_per_second=330.834k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/1024/10000 1635 us 1634 us 428 items_per_second=626.497k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/100 2020 us 2019 us 344 items_per_second=507.07k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/10 2114 us 2114 us 331 items_per_second=484.407k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/2 2318 us 2317 us 303 items_per_second=441.867k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/1 2499 us 2498 us 280 items_per_second=409.852k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/0 1636 us 1636 us 420 items_per_second=625.86k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/4096/10000 6523 us 6522 us 108 items_per_second=628.047k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/100 8122 us 8120 us 86 items_per_second=504.427k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/10 8473 us 8472 us 83 items_per_second=483.476k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/2 9209 us 9207 us 76 items_per_second=444.856k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/1 9863 us 9861 us 70 items_per_second=415.358k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/0 6532 us 6531 us 107 items_per_second=627.155k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/1024/10000 1628 us 1628 us 431 items_per_second=629.181k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/100 12616 us 12614 us 55 items_per_second=81.1817k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/10 89235 us 89216 us 8 items_per_second=11.4778k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/2 279436 us 279381 us 2 items_per_second=3.66524k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/1 2474 us 2473 us 283 items_per_second=413.992k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/0 1627 us 1627 us 429 items_per_second=629.503k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/4096/10000 6484 us 6483 us 107 items_per_second=631.833k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/100 49194 us 49186 us 14 items_per_second=83.2752k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/10 414187 us 414110 us 2 items_per_second=9.89109k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/2 1350538 us 1350126 us 1 items_per_second=3.03379k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/1 9590 us 9588 us 73 items_per_second=427.195k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/0 6475 us 6474 us 108 items_per_second=632.68k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/10000 3679 us 3678 us 190 items_per_second=278.42k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/100 17617 us 17614 us 40 items_per_second=58.1361k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/10 191058 us 191017 us 4 items_per_second=5.36077k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/2 718916 us 718763 us 1 items_per_second=1.42467k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/1 5451 us 5450 us 129 items_per_second=187.874k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/0 3675 us 3675 us 190 items_per_second=278.664k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/10000 17341 us 17338 us 41 items_per_second=236.242k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/100 100409 us 100389 us 7 items_per_second=40.8013k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/10 921094 us 920851 us 1 items_per_second=4.44806k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/2 3632795 us 3631833 us 1 items_per_second=1.12781k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/1 21414 us 21411 us 33 items_per_second=191.299k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/0 14967 us 14965 us 46 items_per_second=273.711k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/10000 3724 us 3724 us 188 items_per_second=275.008k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/100 13209 us 13207 us 53 items_per_second=77.5372k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/10 140985 us 140961 us 5 items_per_second=7.26443k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/2 500871 us 500788 us 2 items_per_second=2.04478k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/1 5491 us 5490 us 128 items_per_second=186.517k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/0 3724 us 3723 us 188 items_per_second=275.053k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/10000 17473 us 17469 us 40 items_per_second=234.471k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/100 73179 us 73165 us 9 items_per_second=55.9829k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/10 602449 us 602346 us 1 items_per_second=6.80008k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/2 4447236 us 4446227 us 1 items_per_second=921.231/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/1 21541 us 21538 us 32 items_per_second=190.176k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/0 15239 us 15236 us 46 items_per_second=268.834k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/10000 4207 us 4207 us 166 items_per_second=243.428k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/100 35682 us 35671 us 20 items_per_second=28.7065k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/10 280647 us 280589 us 2 items_per_second=3.64946k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/2 739914 us 739742 us 1 items_per_second=1.38427k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/1 6503 us 6502 us 107 items_per_second=157.495k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/0 4218 us 4218 us 166 items_per_second=242.792k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/10000 17237 us 17233 us 40 items_per_second=237.677k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/100 153723 us 153695 us 5 items_per_second=26.6501k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/10 1299692 us 1299383 us 1 items_per_second=3.15227k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/2 3874521 us 3873499 us 1 items_per_second=1.05744k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/1 25555 us 25551 us 27 items_per_second=160.306k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/0 17337 us 17332 us 41 items_per_second=236.323k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/10000 6799 us 6798 us 104 items_per_second=150.63k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/100 45711 us 45700 us 15 items_per_second=22.4068k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/10 445610 us 445504 us 2 items_per_second=2.29852k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/2 1191552 us 1191089 us 1 items_per_second=859.718/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/1 9183 us 9182 us 75 items_per_second=111.529k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/0 6818 us 6816 us 104 items_per_second=150.231k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/10000 31751 us 31744 us 22 items_per_second=129.033k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/100 224880 us 224830 us 3 items_per_second=18.2182k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/10 2333998 us 2332975 us 1 items_per_second=1.7557k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/2 5439532 us 5437715 us 1 items_per_second=753.258/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/1 36049 us 36043 us 19 items_per_second=113.64k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/0 27792 us 27785 us 25 items_per_second=147.415k/s null_percent=0 size=4.096k
I find it interesting that some benchmark variations fall to abysmal speeds, for example for {int32, int64}:
GrouperWithMultiTypes/"{int32, int64}"/1024/100 12616 us 12614 us 55 items_per_second=81.1817k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/10 89235 us 89216 us 8 items_per_second=11.4778k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/2 279436 us 279381 us 2 items_per_second=3.66524k/s null_percent=50 size=1.024k
but not the same benchmark for {int32, int32}:
GrouperWithMultiTypes/"{int32, int32}"/1024/100 2020 us 2019 us 344 items_per_second=507.07k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/10 2114 us 2114 us 331 items_per_second=484.407k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/2 2318 us 2317 us 303 items_per_second=441.867k/s null_percent=50 size=1.024k
024/2 279436 us
Ah, nice catch. The performance problem here should be due to two reasons:
- A random null ratio of more than 50% will increase the cost of comparison. CompareColumnsToRows requires more branches to participate. https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L382-L391
- The reason why the performance of int32+int64 is much worse than that of int32+int32 is because the different col_width of each row needs to enter different branches during the comparison process, which will destroy the CPU pipeline.
https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L176-L199
@ZhangHuiGui I have lost track of what this PR's status is. Does it need to wait for other PRs? Do you intend to bring further changes to it?
@ZhangHuiGui I have lost track of what this PR's status is. Does it need to wait for other PRs? Do you intend to bring further changes to it?
This PR is mainly used for Grouper performance testing and is relatively independent. A performance problem previously discovered(#41233) through this test will be fixed in other PRs. I think there is nothing that needs to be added in this PR.
@pitrou It looks like this PR can move forward?
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 28ab4afef423613c20cbe4171c29f9dad258b136.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.