arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-39669: [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric

Open llama90 opened this issue 1 year ago • 3 comments

Rationale for this change

The Gandiva microbenchmarks only present an iteration time in (nano,micro...)seconds. That is usually tedious to read and difficult to interpret.

What changes are included in this PR?

Ensure that Gandiva benchmarks present an items/seconds and/or a bytes/seconds metric where that makes sense.

Are these changes tested?

Manually

Are there any user-facing changes?

No

  • GitHub Issue: #39669

llama90 avatar Mar 09 '24 12:03 llama90

:warning: GitHub issue #39669 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Mar 09 '24 12:03 github-actions[bot]

I attached the benchmark result.

gandiva-micro-benchmarks.txt
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
2024-03-09T21:05:13+09:00
Running /Users/lama/workspace/arrow-new/cpp/cmake-build-debug/debug/gandiva-micro-benchmarks
Run on (10 X 23.9997 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 5.97, 4.98, 4.91
***WARNING*** Library was built as DEBUG. Timings may be affected.
/Users/lama/workspace/arrow-new/cpp/src/gandiva/cache.cc:46: Creating gandiva cache with capacity of 5000
/Users/lama/workspace/arrow-new/cpp/src/gandiva/engine.cc:276: Detected CPU Name : apple-m1
/Users/lama/workspace/arrow-new/cpp/src/gandiva/engine.cc:277: Detected CPU Features: []
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
TimedTestExprCompilation             16079 us        15939 us           34
TimedTestAdd3                         2867 us         2835 us          250 bytes_per_second=8.46676Mi/s items_per_second=369.918M/s
TimedTestBigNested                    9505 us         9427 us           73 bytes_per_second=1.45319Mi/s items_per_second=111.236M/s
TimedTestExtractYear                  9019 us         8895 us           78 bytes_per_second=2.88265Mi/s items_per_second=117.884M/s
TimedTestFilterAdd2                   4165 us         4143 us          168 bytes_per_second=8.62034Mi/s items_per_second=253.094M/s
TimedTestFilterLike                  13788 us        13673 us           51 bytes_per_second=8.60435Mi/s items_per_second=76.6896M/s
TimedTestCastFloatFromString         71768 us        71089 us           10 bytes_per_second=8.44014Mi/s items_per_second=14.7502M/s
TimedTestCastIntFromString           39291 us        39103 us           18 bytes_per_second=8.52441Mi/s items_per_second=26.8155M/s
TimedTestAllocs                     118823 us       118236 us            6 bytes_per_second=8.45765Mi/s items_per_second=8.86849M/s
TimedTestOutputStringAllocs         200606 us       199705 us            4 bytes_per_second=7.51106Mi/s items_per_second=5.25061M/s
TimedTestMultiOr                      9325 us         9247 us           75 bytes_per_second=8.65125Mi/s items_per_second=11.0736M/s
TimedTestInExpr                      24309 us        23456 us           29 bytes_per_second=8.82053Mi/s items_per_second=4.36558M/s
DecimalAdd2Fast                       3931 us         3873 us          179 bytes_per_second=11.5409Mi/s items_per_second=270.772M/s
DecimalAdd2LeadingZeroes              7846 us         7386 us           98 bytes_per_second=11.0523Mi/s items_per_second=141.967M/s
DecimalAdd2LeadingZeroesWithDiv      27088 us        26270 us           26 bytes_per_second=11.7126Mi/s items_per_second=39.9149M/s
DecimalAdd2Large                    124786 us       122303 us            6 bytes_per_second=10.9019Mi/s items_per_second=8.5736M/s
DecimalAdd3Fast                       4173 us         4129 us          169 bytes_per_second=17.197Mi/s items_per_second=253.956M/s
DecimalAdd3LeadingZeroes             10569 us        10498 us           66 bytes_per_second=17.3201Mi/s items_per_second=99.8882M/s
DecimalAdd3LeadingZeroesWithDiv      66255 us        64474 us           11 bytes_per_second=16.9201Mi/s items_per_second=16.2635M/s
DecimalAdd3Large                    244734 us       243209 us            3 bytes_per_second=16.4468Mi/s items_per_second=4.31143M/s

llama90 avatar Mar 09 '24 12:03 llama90

thank you for review.

Based on what I've looked into, it seems like the changes needed for the Gandiva benchmark are only in this file.

https://github.com/search?q=repo%3Aapache%2Farrow%20path%3A%2F%5Ecpp%5C%2Fsrc%5C%2Fgandiva%5C%2Ftests%5C%2F%2F%20BENCHMARK&type=code

llama90 avatar Mar 13 '24 00:03 llama90

OK. I'll merge this.

kou avatar Mar 13 '24 07:03 kou

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 7ee25f1616bfb73bd2d76a832a89303492ab302d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.