arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-41035: [C++] Add a grouper benchmark for preventing performance regression

Open ZhangHuiGui opened this issue 1 year ago • 15 comments

Rationale for this change

Add a grouper benchmark for preventing performance regression .

https://github.com/apache/arrow/pull/40998#issuecomment-2039204161.

What changes are included in this PR?

Added a benchmark.

Are these changes tested?

Needn't.

Are there any user-facing changes?

No

  • GitHub Issue: #41035

ZhangHuiGui avatar Apr 05 '24 15:04 ZhangHuiGui

:warning: GitHub issue #41035 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Apr 05 '24 15:04 github-actions[bot]

@ursabot please benchmark

ZhangHuiGui avatar Apr 05 '24 15:04 ZhangHuiGui

Benchmark runs are scheduled for commit 620fc87373a76e1805167cb2ed81bb482f10eb1a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

ursabot avatar Apr 05 '24 15:04 ursabot

@pitrou @westonpace PTAL?

ZhangHuiGui avatar Apr 05 '24 15:04 ZhangHuiGui

Thanks for your patience. Conbench analyzed the 7 benchmarking runs that have been run so far on PR commit 620fc87373a76e1805167cb2ed81bb482f10eb1a.

There was 1 benchmark result with an error:

There were 4 benchmark results indicating a performance regression:

The full Conbench report has more details.

@ursabot please benchmark

ZhangHuiGui avatar Apr 10 '24 14:04 ZhangHuiGui

Benchmark runs are scheduled for commit 001d4c30a2d7d886fa5b452803e8be84efb3f82e. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

ursabot avatar Apr 10 '24 14:04 ursabot

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 001d4c30a2d7d886fa5b452803e8be84efb3f82e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

I would really prefer items/s over bytes/s as bytes/s is highly misleading. In the end, we are most of the time interested in performance over logical items, not in how large the data representation is.

pitrou avatar Apr 10 '24 19:04 pitrou

(that said, it's not forbidden to add both)

pitrou avatar Apr 10 '24 19:04 pitrou

One minor nit is that I prefer bytes processed over items processed (it is more comparable and I don't have to worry about knowing as many details of the benchmark) but I won't force that.

As pitrou said, bytes is actually not accurate in this benchmark because we have a varlength type, which may make the bytes/s indicator very high, but it is not as intuitive as items/s.

The results are like below:

2024-04-11T09:04:45+08:00
Running ./arrow-compute-grouper-benchmark
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 1.18, 1.64, 1.97
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
---------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------------------
GrouperWithMultiTypes/"{boolean}"/1024/10000                                                   1329 us         1329 us          526 items_per_second=770.333k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/100                                                     1508 us         1507 us          465 items_per_second=679.37k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/10                                                      1534 us         1533 us          457 items_per_second=667.84k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/2                                                       1616 us         1616 us          432 items_per_second=633.552k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/1                                                       1720 us         1720 us          405 items_per_second=595.489k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{boolean}"/1024/0                                                       1333 us         1333 us          527 items_per_second=768.172k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{boolean}"/4096/10000                                                   5127 us         5127 us          137 items_per_second=798.951k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/100                                                     5819 us         5818 us          120 items_per_second=703.99k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/10                                                      5877 us         5876 us          118 items_per_second=697.06k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/2                                                       6308 us         6307 us          112 items_per_second=649.449k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/1                                                       6641 us         6640 us          105 items_per_second=616.851k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{boolean}"/4096/0                                                       5121 us         5120 us          136 items_per_second=800.072k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32}"/1024/10000                                                     1368 us         1368 us          513 items_per_second=748.798k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/100                                                       1620 us         1620 us          432 items_per_second=632.098k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/10                                                        1597 us         1597 us          439 items_per_second=641.304k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/2                                                         1625 us         1624 us          431 items_per_second=630.423k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/1                                                         1637 us         1636 us          428 items_per_second=625.837k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32}"/1024/0                                                         1379 us         1378 us          508 items_per_second=743.075k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32}"/4096/10000                                                     5402 us         5401 us          130 items_per_second=758.434k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/100                                                       6196 us         6195 us          113 items_per_second=661.201k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/10                                                        6165 us         6164 us          113 items_per_second=664.549k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/2                                                         6300 us         6299 us          111 items_per_second=650.222k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/1                                                         6287 us         6286 us          111 items_per_second=651.6k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32}"/4096/0                                                         5412 us         5411 us          129 items_per_second=757.029k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int64}"/1024/10000                                                     1347 us         1347 us          518 items_per_second=760.192k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/100                                                       1569 us         1568 us          447 items_per_second=652.955k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/10                                                        1577 us         1576 us          444 items_per_second=649.554k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/2                                                         1604 us         1604 us          435 items_per_second=638.412k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/1                                                         1607 us         1607 us          435 items_per_second=637.123k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int64}"/1024/0                                                         1374 us         1374 us          522 items_per_second=745.246k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int64}"/4096/10000                                                     5371 us         5370 us          129 items_per_second=762.703k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/100                                                       6181 us         6180 us          113 items_per_second=662.826k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/10                                                        6157 us         6156 us          113 items_per_second=665.384k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/2                                                         6234 us         6232 us          112 items_per_second=657.237k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/1                                                         6251 us         6250 us          113 items_per_second=655.382k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int64}"/4096/0                                                         5371 us         5370 us          130 items_per_second=762.709k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{utf8}"/1024/10000                                                      2497 us         2496 us          281 items_per_second=410.19k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/100                                                        2694 us         2694 us          260 items_per_second=380.111k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/10                                                         2617 us         2616 us          268 items_per_second=391.371k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/2                                                          2551 us         2551 us          274 items_per_second=401.434k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/1                                                          3188 us         3187 us          220 items_per_second=321.258k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{utf8}"/1024/0                                                          2501 us         2500 us          273 items_per_second=409.55k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{utf8}"/4096/10000                                                     10590 us        10588 us           66 items_per_second=386.85k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/100                                                       10617 us        10615 us           66 items_per_second=385.858k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/10                                                        10379 us        10377 us           67 items_per_second=394.717k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/2                                                         10074 us        10072 us           70 items_per_second=406.684k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/1                                                         12487 us        12485 us           56 items_per_second=328.078k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{utf8}"/4096/0                                                          9820 us         9817 us           71 items_per_second=417.214k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/10000                                    3061 us         3060 us          228 items_per_second=334.659k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/100                                      3316 us         3315 us          209 items_per_second=308.917k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/10                                       3340 us         3339 us          211 items_per_second=306.641k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/2                                        3242 us         3241 us          217 items_per_second=315.953k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/1                                        3157 us         3156 us          221 items_per_second=324.446k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/1024/0                                        3114 us         3114 us          226 items_per_second=328.872k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/10000                                   12839 us        12836 us           55 items_per_second=319.102k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/100                                     13759 us        13755 us           51 items_per_second=297.774k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/10                                      13416 us        13413 us           53 items_per_second=305.371k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/2                                       13103 us        13101 us           54 items_per_second=312.646k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/1                                       12628 us        12626 us           55 items_per_second=324.398k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{fixed_size_binary(128)}"/4096/0                                       12778 us        12775 us           55 items_per_second=320.619k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/10000                                             3102 us         3102 us          226 items_per_second=330.162k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/100                                              13830 us        13827 us           51 items_per_second=74.0558k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/10                                              394675 us       394582 us            2 items_per_second=2.59515k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/2                                              1409485 us      1409178 us            1 items_per_second=726.665/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/1                                                 4343 us         4343 us          162 items_per_second=235.808k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/1024/0                                                 3111 us         3111 us          226 items_per_second=329.148k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/10000                                            13898 us        13894 us           51 items_per_second=294.793k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/100                                              54716 us        54700 us           12 items_per_second=74.8806k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/10                                              941758 us       941554 us            1 items_per_second=4.35026k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/2                                             12871536 us     12868926 us            1 items_per_second=318.286/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/1                                                16971 us        16968 us           41 items_per_second=241.392k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{boolean, utf8}"/4096/0                                                12383 us        12381 us           56 items_per_second=330.834k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/1024/10000                                              1635 us         1634 us          428 items_per_second=626.497k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/100                                                2020 us         2019 us          344 items_per_second=507.07k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/10                                                 2114 us         2114 us          331 items_per_second=484.407k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/2                                                  2318 us         2317 us          303 items_per_second=441.867k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/1                                                  2499 us         2498 us          280 items_per_second=409.852k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/0                                                  1636 us         1636 us          420 items_per_second=625.86k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/4096/10000                                              6523 us         6522 us          108 items_per_second=628.047k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/100                                                8122 us         8120 us           86 items_per_second=504.427k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/10                                                 8473 us         8472 us           83 items_per_second=483.476k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/2                                                  9209 us         9207 us           76 items_per_second=444.856k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/1                                                  9863 us         9861 us           70 items_per_second=415.358k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, int32}"/4096/0                                                  6532 us         6531 us          107 items_per_second=627.155k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/1024/10000                                              1628 us         1628 us          431 items_per_second=629.181k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/100                                               12616 us        12614 us           55 items_per_second=81.1817k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/10                                                89235 us        89216 us            8 items_per_second=11.4778k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/2                                                279436 us       279381 us            2 items_per_second=3.66524k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/1                                                  2474 us         2473 us          283 items_per_second=413.992k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/0                                                  1627 us         1627 us          429 items_per_second=629.503k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/4096/10000                                              6484 us         6483 us          107 items_per_second=631.833k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/100                                               49194 us        49186 us           14 items_per_second=83.2752k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/10                                               414187 us       414110 us            2 items_per_second=9.89109k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/2                                               1350538 us      1350126 us            1 items_per_second=3.03379k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/1                                                  9590 us         9588 us           73 items_per_second=427.195k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, int64}"/4096/0                                                  6475 us         6474 us          108 items_per_second=632.68k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/10000                                      3679 us         3678 us          190 items_per_second=278.42k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/100                                       17617 us        17614 us           40 items_per_second=58.1361k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/10                                       191058 us       191017 us            4 items_per_second=5.36077k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/2                                        718916 us       718763 us            1 items_per_second=1.42467k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/1                                          5451 us         5450 us          129 items_per_second=187.874k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/1024/0                                          3675 us         3675 us          190 items_per_second=278.664k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/10000                                     17341 us        17338 us           41 items_per_second=236.242k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/100                                      100409 us       100389 us            7 items_per_second=40.8013k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/10                                       921094 us       920851 us            1 items_per_second=4.44806k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/2                                       3632795 us      3631833 us            1 items_per_second=1.12781k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/1                                         21414 us        21411 us           33 items_per_second=191.299k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{boolean, int64, utf8}"/4096/0                                         14967 us        14965 us           46 items_per_second=273.711k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/10000                                      3724 us         3724 us          188 items_per_second=275.008k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/100                                       13209 us        13207 us           53 items_per_second=77.5372k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/10                                       140985 us       140961 us            5 items_per_second=7.26443k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/2                                        500871 us       500788 us            2 items_per_second=2.04478k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/1                                          5491 us         5490 us          128 items_per_second=186.517k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/1024/0                                          3724 us         3723 us          188 items_per_second=275.053k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/10000                                     17473 us        17469 us           40 items_per_second=234.471k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/100                                       73179 us        73165 us            9 items_per_second=55.9829k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/10                                       602449 us       602346 us            1 items_per_second=6.80008k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/2                                       4447236 us      4446227 us            1 items_per_second=921.231/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/1                                         21541 us        21538 us           32 items_per_second=190.176k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, boolean, utf8}"/4096/0                                         15239 us        15236 us           46 items_per_second=268.834k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/10000                               4207 us         4207 us          166 items_per_second=243.428k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/100                                35682 us        35671 us           20 items_per_second=28.7065k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/10                                280647 us       280589 us            2 items_per_second=3.64946k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/2                                 739914 us       739742 us            1 items_per_second=1.38427k/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/1                                   6503 us         6502 us          107 items_per_second=157.495k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/1024/0                                   4218 us         4218 us          166 items_per_second=242.792k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/10000                              17237 us        17233 us           40 items_per_second=237.677k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/100                               153723 us       153695 us            5 items_per_second=26.6501k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/10                               1299692 us      1299383 us            1 items_per_second=3.15227k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/2                                3874521 us      3873499 us            1 items_per_second=1.05744k/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/1                                  25555 us        25551 us           27 items_per_second=160.306k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{int32, int64, boolean, utf8}"/4096/0                                  17337 us        17332 us           41 items_per_second=236.323k/s null_percent=0 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/10000       6799 us         6798 us          104 items_per_second=150.63k/s null_percent=0.01 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/100        45711 us        45700 us           15 items_per_second=22.4068k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/10        445610 us       445504 us            2 items_per_second=2.29852k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/2        1191552 us      1191089 us            1 items_per_second=859.718/s null_percent=50 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/1           9183 us         9182 us           75 items_per_second=111.529k/s null_percent=100 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/1024/0           6818 us         6816 us          104 items_per_second=150.231k/s null_percent=0 size=1.024k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/10000      31751 us        31744 us           22 items_per_second=129.033k/s null_percent=0.01 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/100       224880 us       224830 us            3 items_per_second=18.2182k/s null_percent=1 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/10       2333998 us      2332975 us            1 items_per_second=1.7557k/s null_percent=10 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/2        5439532 us      5437715 us            1 items_per_second=753.258/s null_percent=50 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/1          36049 us        36043 us           19 items_per_second=113.64k/s null_percent=100 size=4.096k
GrouperWithMultiTypes/"{utf8, int32, int64, fixed_size_binary(128), boolean}"/4096/0          27792 us        27785 us           25 items_per_second=147.415k/s null_percent=0 size=4.096k

ZhangHuiGui avatar Apr 11 '24 01:04 ZhangHuiGui

I find it interesting that some benchmark variations fall to abysmal speeds, for example for {int32, int64}:

GrouperWithMultiTypes/"{int32, int64}"/1024/100                                               12616 us        12614 us           55 items_per_second=81.1817k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/10                                                89235 us        89216 us            8 items_per_second=11.4778k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int64}"/1024/2                                                279436 us       279381 us            2 items_per_second=3.66524k/s null_percent=50 size=1.024k

but not the same benchmark for {int32, int32}:

GrouperWithMultiTypes/"{int32, int32}"/1024/100                                                2020 us         2019 us          344 items_per_second=507.07k/s null_percent=1 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/10                                                 2114 us         2114 us          331 items_per_second=484.407k/s null_percent=10 size=1.024k
GrouperWithMultiTypes/"{int32, int32}"/1024/2                                                  2318 us         2317 us          303 items_per_second=441.867k/s null_percent=50 size=1.024k

pitrou avatar Apr 11 '24 15:04 pitrou

024/2                                                279436 us

Ah, nice catch. The performance problem here should be due to two reasons:

  1. A random null ratio of more than 50% will increase the cost of comparison. CompareColumnsToRows requires more branches to participate. https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L382-L391
  2. The reason why the performance of int32+int64 is much worse than that of int32+int32 is because the different col_width of each row needs to enter different branches during the comparison process, which will destroy the CPU pipeline.

https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L176-L199

ZhangHuiGui avatar Apr 11 '24 15:04 ZhangHuiGui

@ZhangHuiGui I have lost track of what this PR's status is. Does it need to wait for other PRs? Do you intend to bring further changes to it?

pitrou avatar May 06 '24 14:05 pitrou

@ZhangHuiGui I have lost track of what this PR's status is. Does it need to wait for other PRs? Do you intend to bring further changes to it?

This PR is mainly used for Grouper performance testing and is relatively independent. A performance problem previously discovered(#41233) through this test will be fixed in other PRs. I think there is nothing that needs to be added in this PR.

ZhangHuiGui avatar May 07 '24 02:05 ZhangHuiGui

@pitrou It looks like this PR can move forward?

ZhangHuiGui avatar May 21 '24 06:05 ZhangHuiGui

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 28ab4afef423613c20cbe4171c29f9dad258b136.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.