GH-39669: [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric
Rationale for this change
The Gandiva microbenchmarks only present an iteration time in (nano,micro...)seconds. That is usually tedious to read and difficult to interpret.
What changes are included in this PR?
Ensure that Gandiva benchmarks present an items/seconds and/or a bytes/seconds metric where that makes sense.
Are these changes tested?
Manually
Are there any user-facing changes?
No
- GitHub Issue: #39669
:warning: GitHub issue #39669 has been automatically assigned in GitHub to PR creator.
I attached the benchmark result.
gandiva-micro-benchmarks.txt
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
2024-03-09T21:05:13+09:00
Running /Users/lama/workspace/arrow-new/cpp/cmake-build-debug/debug/gandiva-micro-benchmarks
Run on (10 X 23.9997 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x10)
Load Average: 5.97, 4.98, 4.91
***WARNING*** Library was built as DEBUG. Timings may be affected.
/Users/lama/workspace/arrow-new/cpp/src/gandiva/cache.cc:46: Creating gandiva cache with capacity of 5000
/Users/lama/workspace/arrow-new/cpp/src/gandiva/engine.cc:276: Detected CPU Name : apple-m1
/Users/lama/workspace/arrow-new/cpp/src/gandiva/engine.cc:277: Detected CPU Features: []
--------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------
TimedTestExprCompilation 16079 us 15939 us 34
TimedTestAdd3 2867 us 2835 us 250 bytes_per_second=8.46676Mi/s items_per_second=369.918M/s
TimedTestBigNested 9505 us 9427 us 73 bytes_per_second=1.45319Mi/s items_per_second=111.236M/s
TimedTestExtractYear 9019 us 8895 us 78 bytes_per_second=2.88265Mi/s items_per_second=117.884M/s
TimedTestFilterAdd2 4165 us 4143 us 168 bytes_per_second=8.62034Mi/s items_per_second=253.094M/s
TimedTestFilterLike 13788 us 13673 us 51 bytes_per_second=8.60435Mi/s items_per_second=76.6896M/s
TimedTestCastFloatFromString 71768 us 71089 us 10 bytes_per_second=8.44014Mi/s items_per_second=14.7502M/s
TimedTestCastIntFromString 39291 us 39103 us 18 bytes_per_second=8.52441Mi/s items_per_second=26.8155M/s
TimedTestAllocs 118823 us 118236 us 6 bytes_per_second=8.45765Mi/s items_per_second=8.86849M/s
TimedTestOutputStringAllocs 200606 us 199705 us 4 bytes_per_second=7.51106Mi/s items_per_second=5.25061M/s
TimedTestMultiOr 9325 us 9247 us 75 bytes_per_second=8.65125Mi/s items_per_second=11.0736M/s
TimedTestInExpr 24309 us 23456 us 29 bytes_per_second=8.82053Mi/s items_per_second=4.36558M/s
DecimalAdd2Fast 3931 us 3873 us 179 bytes_per_second=11.5409Mi/s items_per_second=270.772M/s
DecimalAdd2LeadingZeroes 7846 us 7386 us 98 bytes_per_second=11.0523Mi/s items_per_second=141.967M/s
DecimalAdd2LeadingZeroesWithDiv 27088 us 26270 us 26 bytes_per_second=11.7126Mi/s items_per_second=39.9149M/s
DecimalAdd2Large 124786 us 122303 us 6 bytes_per_second=10.9019Mi/s items_per_second=8.5736M/s
DecimalAdd3Fast 4173 us 4129 us 169 bytes_per_second=17.197Mi/s items_per_second=253.956M/s
DecimalAdd3LeadingZeroes 10569 us 10498 us 66 bytes_per_second=17.3201Mi/s items_per_second=99.8882M/s
DecimalAdd3LeadingZeroesWithDiv 66255 us 64474 us 11 bytes_per_second=16.9201Mi/s items_per_second=16.2635M/s
DecimalAdd3Large 244734 us 243209 us 3 bytes_per_second=16.4468Mi/s items_per_second=4.31143M/s
thank you for review.
Based on what I've looked into, it seems like the changes needed for the Gandiva benchmark are only in this file.
https://github.com/search?q=repo%3Aapache%2Farrow%20path%3A%2F%5Ecpp%5C%2Fsrc%5C%2Fgandiva%5C%2Ftests%5C%2F%2F%20BENCHMARK&type=code
OK. I'll merge this.
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 7ee25f1616bfb73bd2d76a832a89303492ab302d.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.