PARQUET-2206: [parquet-cpp] Microbenchmark for ColumnReadaer ReadBatch and Skip
Adding a micro benchmark for column reader ReadBatch and Skip. Later, I will add benchmarks for RecordReader's ReadRecords and SkipRecords.
Here are the results from my machine:
-------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------
BM_Skip/0/0/0/1/iterations:1000 11250680 ns 11133405 ns 1000
BM_Skip/0/0/0/1000/iterations:1000 134092 ns 134455 ns 1000
BM_Skip/0/0/0/10000/iterations:1000 175717 ns 175677 ns 1000
BM_Skip/0/0/0/100000/iterations:1000 217368 ns 218672 ns 1000
BM_Skip/0/0/1/1/iterations:1000 150319842 ns 149567587 ns 1000
BM_Skip/0/0/1/1000/iterations:1000 244565 ns 244931 ns 1000
BM_Skip/0/0/1/10000/iterations:1000 115395 ns 115924 ns 1000
BM_Skip/0/0/1/100000/iterations:1000 115241 ns 115916 ns 1000
BM_Skip/1/0/0/1/iterations:1000 23289018 ns 23190677 ns 1000
BM_Skip/1/0/0/1000/iterations:1000 622022 ns 621621 ns 1000
BM_Skip/1/0/0/10000/iterations:1000 540981 ns 540620 ns 1000
BM_Skip/1/0/0/100000/iterations:1000 543156 ns 543126 ns 1000
BM_Skip/1/0/1/1/iterations:1000 149224507 ns 148683644 ns 1000
BM_Skip/1/0/1/1000/iterations:1000 805812 ns 805417 ns 1000
BM_Skip/1/0/1/10000/iterations:1000 702999 ns 700108 ns 1000
BM_Skip/1/0/1/100000/iterations:1000 654163 ns 651947 ns 1000
BM_Skip/1/1/0/1/iterations:1000 33160880 ns 33051386 ns 1000
BM_Skip/1/1/0/1000/iterations:1000 999412 ns 998795 ns 1000
BM_Skip/1/1/0/10000/iterations:1000 815868 ns 814927 ns 1000
BM_Skip/1/1/0/100000/iterations:1000 781166 ns 781112 ns 1000
BM_Skip/1/1/1/1/iterations:1000 165600118 ns 164864530 ns 1000
BM_Skip/1/1/1/1000/iterations:1000 1130975 ns 1130252 ns 1000
BM_Skip/1/1/1/10000/iterations:1000 1009628 ns 1009589 ns 1000
BM_Skip/1/1/1/100000/iterations:1000 1029064 ns 1028726 ns 1000
@emkornfield Could you review this pull request? Thanks!
I updated the numbers above after I omitted the two commits from the other PR.
@emkornfield I have addressed your comments, please take a look.
https://issues.apache.org/jira/browse/PARQUET-2206
:warning: Ticket has no components in JIRA, make sure you assign one.
:warning: Ticket has not been started in JIRA, please click 'Start Progress'.
@github-actions autotune
@fatemehp it doesn't look like autotune is working could you fix the linting error?
@emkornfield I have fixed the error.
Slightly different lint error now: "/arrow/cpp/src/parquet/column_reader_benchmark.cc:0: No copyright message found. You should have a line: "Copyright [year] <Copyright Owner>" [legal/copyright] [5]"
All files need to have the apache license at the top, should be simple copy and paste.
@pitrou I have tried to address your comments. Please take a look.
The output looks like this now (for 10 iterations):
---------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------------------------------
BM_Skip/Repetition:0/BatchSize:100/iterations:10 2134030 ns 2107608 ns 10
BM_Skip/Repetition:0/BatchSize:1000/iterations:10 487215 ns 485262 ns 10
BM_Skip/Repetition:0/BatchSize:10000/iterations:10 157823 ns 158910 ns 10
BM_Skip/Repetition:0/BatchSize:100000/iterations:10 180030 ns 180869 ns 10
BM_Skip/Repetition:1/BatchSize:100/iterations:10 4897199 ns 4828400 ns 10
BM_Skip/Repetition:1/BatchSize:1000/iterations:10 902888 ns 903919 ns 10
BM_Skip/Repetition:1/BatchSize:10000/iterations:10 908020 ns 904868 ns 10
BM_Skip/Repetition:1/BatchSize:100000/iterations:10 768673 ns 760210 ns 10
BM_Skip/Repetition:2/BatchSize:100/iterations:10 6107231 ns 6061302 ns 10
BM_Skip/Repetition:2/BatchSize:1000/iterations:10 1443645 ns 1421940 ns 10
BM_Skip/Repetition:2/BatchSize:10000/iterations:10 1129830 ns 1120306 ns 10
BM_Skip/Repetition:2/BatchSize:100000/iterations:10 1510731 ns 1495184 ns 10
BM_ReadBatch/Repetition:0/BatchSize:100/iterations:10 145818 ns 146332 ns 10
BM_ReadBatch/Repetition:0/BatchSize:1000/iterations:10 242347 ns 236740 ns 10
BM_ReadBatch/Repetition:0/BatchSize:10000/iterations:10 166230 ns 166372 ns 10
BM_ReadBatch/Repetition:0/BatchSize:100000/iterations:10 167861 ns 167853 ns 10
BM_ReadBatch/Repetition:1/BatchSize:100/iterations:10 2119537 ns 2116998 ns 10
BM_ReadBatch/Repetition:1/BatchSize:1000/iterations:10 797134 ns 794149 ns 10
BM_ReadBatch/Repetition:1/BatchSize:10000/iterations:10 657227 ns 636867 ns 10
BM_ReadBatch/Repetition:1/BatchSize:100000/iterations:10 549434 ns 547964 ns 10
BM_ReadBatch/Repetition:2/BatchSize:100/iterations:10 3120816 ns 3082428 ns 10
BM_ReadBatch/Repetition:2/BatchSize:1000/iterations:10 989707 ns 979577 ns 10
BM_ReadBatch/Repetition:2/BatchSize:10000/iterations:10 1164059 ns 1161694 ns 10
BM_ReadBatch/Repetition:2/BatchSize:100000/iterations:10 837577 ns 811593 ns 10
I pushed some changes, can you check if they are ok for you @fatemehp ?
For the record, here are the benchmark results here:
---------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------
ColumnReaderSkipInt32/Repetition:0/BatchSize:100/iterations:10 2191628 ns 2190581 ns 10 bytes_per_second=2.17676G/s
ColumnReaderSkipInt32/Repetition:0/BatchSize:1000/iterations:10 331052 ns 330886 ns 10 bytes_per_second=14.4109G/s
ColumnReaderSkipInt32/Repetition:0/BatchSize:10000/iterations:10 157148 ns 156976 ns 10 bytes_per_second=30.3764G/s
ColumnReaderSkipInt32/Repetition:0/BatchSize:100000/iterations:10 65870 ns 65674 ns 10 bytes_per_second=72.6066G/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:100/iterations:10 6318134 ns 4394207 ns 10 bytes_per_second=591.599M/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:1000/iterations:10 971296 ns 970811 ns 10 bytes_per_second=2.61501G/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:10000/iterations:10 734258 ns 733732 ns 10 bytes_per_second=3.45995G/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:100000/iterations:10 330200 ns 330065 ns 10 bytes_per_second=7.69144G/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:100/iterations:10 5695362 ns 5693852 ns 10 bytes_per_second=484.091M/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:1000/iterations:10 1402463 ns 1401984 ns 10 bytes_per_second=1.91995G/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:10000/iterations:10 1148923 ns 1148497 ns 10 bytes_per_second=2.34371G/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:100000/iterations:10 525567 ns 524824 ns 10 bytes_per_second=5.12884G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:100/iterations:10 189879 ns 189498 ns 10 bytes_per_second=25.1632G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000/iterations:10 114942 ns 114825 ns 10 bytes_per_second=41.5272G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:10000/iterations:10 115618 ns 115571 ns 10 bytes_per_second=41.2593G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:100000/iterations:10 116905 ns 116825 ns 10 bytes_per_second=40.8163G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:100/iterations:10 2014855 ns 2014633 ns 10 bytes_per_second=1.26012G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000/iterations:10 709143 ns 708919 ns 10 bytes_per_second=3.58105G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:10000/iterations:10 568496 ns 568183 ns 10 bytes_per_second=4.46807G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:100000/iterations:10 547385 ns 547319 ns 10 bytes_per_second=4.63839G/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:100/iterations:10 3282103 ns 3281312 ns 10 bytes_per_second=840.013M/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000/iterations:10 1131684 ns 1130339 ns 10 bytes_per_second=2.38136G/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:10000/iterations:10 885073 ns 885070 ns 10 bytes_per_second=3.04128G/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:100000/iterations:10 868194 ns 867931 ns 10 bytes_per_second=3.10133G/s
@pitrou the changes look fine. Thanks!
@wgtmac regarding your question about benchmarking other types, that is a good point. I am hoping that this benchmark is a starting point, and we can gradually add to it.
@fatemehp Is it expected that skipping is slower in most of these micro-benchmarks?
In general, skip should be on-par with read or faster. This pull request should improve the Skip numbers: https://github.com/apache/arrow/pull/14509
Also, note that only for the case of skipping whole pages we would see the real benefit of skipping. Otherwise, we are reading and throwing away values.
Benchmark runs are scheduled for baseline = a068096d9c561b5bffff7211f8d5288a728c3a9a and contender = 3b852e49fec85b57545c6edc6c66d3da93de2c06. 3b852e49fec85b57545c6edc6c66d3da93de2c06 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Finished :arrow_down:0.34% :arrow_up:0.03%] test-mac-arm
[Finished :arrow_down:0.27% :arrow_up:0.0%] ursa-i9-9960x
[Finished :arrow_down:1.26% :arrow_up:0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3b852e49 ec2-t3-xlarge-us-east-2
[Finished] 3b852e49 test-mac-arm
[Finished] 3b852e49 ursa-i9-9960x
[Finished] 3b852e49 ursa-thinkcentre-m75q
[Finished] a068096d ec2-t3-xlarge-us-east-2
[Finished] a068096d test-mac-arm
[Finished] a068096d ursa-i9-9960x
[Finished] a068096d ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java