arrow PARQUET-2206: [parquet-cpp] Microbenchmark for ColumnReadaer ReadBatch and Skip

Adding a micro benchmark for column reader ReadBatch and Skip. Later, I will add benchmarks for RecordReader's ReadRecords and SkipRecords.

Here are the results from my machine:

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_Skip/0/0/0/1/iterations:1000        11250680 ns     11133405 ns         1000
BM_Skip/0/0/0/1000/iterations:1000       134092 ns       134455 ns         1000
BM_Skip/0/0/0/10000/iterations:1000      175717 ns       175677 ns         1000
BM_Skip/0/0/0/100000/iterations:1000     217368 ns       218672 ns         1000
BM_Skip/0/0/1/1/iterations:1000       150319842 ns    149567587 ns         1000
BM_Skip/0/0/1/1000/iterations:1000       244565 ns       244931 ns         1000
BM_Skip/0/0/1/10000/iterations:1000      115395 ns       115924 ns         1000
BM_Skip/0/0/1/100000/iterations:1000     115241 ns       115916 ns         1000
BM_Skip/1/0/0/1/iterations:1000        23289018 ns     23190677 ns         1000
BM_Skip/1/0/0/1000/iterations:1000       622022 ns       621621 ns         1000
BM_Skip/1/0/0/10000/iterations:1000      540981 ns       540620 ns         1000
BM_Skip/1/0/0/100000/iterations:1000     543156 ns       543126 ns         1000
BM_Skip/1/0/1/1/iterations:1000       149224507 ns    148683644 ns         1000
BM_Skip/1/0/1/1000/iterations:1000       805812 ns       805417 ns         1000
BM_Skip/1/0/1/10000/iterations:1000      702999 ns       700108 ns         1000
BM_Skip/1/0/1/100000/iterations:1000     654163 ns       651947 ns         1000
BM_Skip/1/1/0/1/iterations:1000        33160880 ns     33051386 ns         1000
BM_Skip/1/1/0/1000/iterations:1000       999412 ns       998795 ns         1000
BM_Skip/1/1/0/10000/iterations:1000      815868 ns       814927 ns         1000
BM_Skip/1/1/0/100000/iterations:1000     781166 ns       781112 ns         1000
BM_Skip/1/1/1/1/iterations:1000       165600118 ns    164864530 ns         1000
BM_Skip/1/1/1/1000/iterations:1000      1130975 ns      1130252 ns         1000
BM_Skip/1/1/1/10000/iterations:1000     1009628 ns      1009589 ns         1000
BM_Skip/1/1/1/100000/iterations:1000    1029064 ns      1028726 ns         1000

Oct 26 '22 19:10 fatemehp

@emkornfield Could you review this pull request? Thanks!

Oct 26 '22 19:10 fatemehp

I updated the numbers above after I omitted the two commits from the other PR.

Oct 26 '22 21:10 fatemehp

@emkornfield I have addressed your comments, please take a look.

Oct 26 '22 21:10 fatemehp

https://issues.apache.org/jira/browse/PARQUET-2206

Oct 26 '22 21:10 github-actions[bot]

:warning: Ticket has no components in JIRA, make sure you assign one.

Oct 26 '22 21:10 github-actions[bot]

:warning: Ticket has not been started in JIRA, please click 'Start Progress'.

Oct 26 '22 21:10 github-actions[bot]

@github-actions autotune

Oct 31 '22 18:10 emkornfield

@fatemehp it doesn't look like autotune is working could you fix the linting error?

Oct 31 '22 20:10 emkornfield

@emkornfield I have fixed the error.

Oct 31 '22 20:10 fatemehp

Slightly different lint error now: "/arrow/cpp/src/parquet/column_reader_benchmark.cc:0: No copyright message found. You should have a line: "Copyright [year] <Copyright Owner>" [legal/copyright] [5]"

All files need to have the apache license at the top, should be simple copy and paste.

Nov 01 '22 06:11 emkornfield

@pitrou I have tried to address your comments. Please take a look.

The output looks like this now (for 10 iterations):

---------------------------------------------------------------------------------------------------
Benchmark                                                         Time             CPU   Iterations
---------------------------------------------------------------------------------------------------
BM_Skip/Repetition:0/BatchSize:100/iterations:10            2134030 ns      2107608 ns           10
BM_Skip/Repetition:0/BatchSize:1000/iterations:10            487215 ns       485262 ns           10
BM_Skip/Repetition:0/BatchSize:10000/iterations:10           157823 ns       158910 ns           10
BM_Skip/Repetition:0/BatchSize:100000/iterations:10          180030 ns       180869 ns           10
BM_Skip/Repetition:1/BatchSize:100/iterations:10            4897199 ns      4828400 ns           10
BM_Skip/Repetition:1/BatchSize:1000/iterations:10            902888 ns       903919 ns           10
BM_Skip/Repetition:1/BatchSize:10000/iterations:10           908020 ns       904868 ns           10
BM_Skip/Repetition:1/BatchSize:100000/iterations:10          768673 ns       760210 ns           10
BM_Skip/Repetition:2/BatchSize:100/iterations:10            6107231 ns      6061302 ns           10
BM_Skip/Repetition:2/BatchSize:1000/iterations:10           1443645 ns      1421940 ns           10
BM_Skip/Repetition:2/BatchSize:10000/iterations:10          1129830 ns      1120306 ns           10
BM_Skip/Repetition:2/BatchSize:100000/iterations:10         1510731 ns      1495184 ns           10
BM_ReadBatch/Repetition:0/BatchSize:100/iterations:10        145818 ns       146332 ns           10
BM_ReadBatch/Repetition:0/BatchSize:1000/iterations:10       242347 ns       236740 ns           10
BM_ReadBatch/Repetition:0/BatchSize:10000/iterations:10      166230 ns       166372 ns           10
BM_ReadBatch/Repetition:0/BatchSize:100000/iterations:10     167861 ns       167853 ns           10
BM_ReadBatch/Repetition:1/BatchSize:100/iterations:10       2119537 ns      2116998 ns           10
BM_ReadBatch/Repetition:1/BatchSize:1000/iterations:10       797134 ns       794149 ns           10
BM_ReadBatch/Repetition:1/BatchSize:10000/iterations:10      657227 ns       636867 ns           10
BM_ReadBatch/Repetition:1/BatchSize:100000/iterations:10     549434 ns       547964 ns           10
BM_ReadBatch/Repetition:2/BatchSize:100/iterations:10       3120816 ns      3082428 ns           10
BM_ReadBatch/Repetition:2/BatchSize:1000/iterations:10       989707 ns       979577 ns           10
BM_ReadBatch/Repetition:2/BatchSize:10000/iterations:10     1164059 ns      1161694 ns           10
BM_ReadBatch/Repetition:2/BatchSize:100000/iterations:10     837577 ns       811593 ns           10

Nov 02 '22 20:11 fatemehp

I pushed some changes, can you check if they are ok for you @fatemehp ?

Nov 10 '22 17:11 pitrou

For the record, here are the benchmark results here:

---------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------
ColumnReaderSkipInt32/Repetition:0/BatchSize:100/iterations:10            2191628 ns      2190581 ns           10 bytes_per_second=2.17676G/s
ColumnReaderSkipInt32/Repetition:0/BatchSize:1000/iterations:10            331052 ns       330886 ns           10 bytes_per_second=14.4109G/s
ColumnReaderSkipInt32/Repetition:0/BatchSize:10000/iterations:10           157148 ns       156976 ns           10 bytes_per_second=30.3764G/s
ColumnReaderSkipInt32/Repetition:0/BatchSize:100000/iterations:10           65870 ns        65674 ns           10 bytes_per_second=72.6066G/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:100/iterations:10            6318134 ns      4394207 ns           10 bytes_per_second=591.599M/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:1000/iterations:10            971296 ns       970811 ns           10 bytes_per_second=2.61501G/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:10000/iterations:10           734258 ns       733732 ns           10 bytes_per_second=3.45995G/s
ColumnReaderSkipInt32/Repetition:1/BatchSize:100000/iterations:10          330200 ns       330065 ns           10 bytes_per_second=7.69144G/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:100/iterations:10            5695362 ns      5693852 ns           10 bytes_per_second=484.091M/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:1000/iterations:10           1402463 ns      1401984 ns           10 bytes_per_second=1.91995G/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:10000/iterations:10          1148923 ns      1148497 ns           10 bytes_per_second=2.34371G/s
ColumnReaderSkipInt32/Repetition:2/BatchSize:100000/iterations:10          525567 ns       524824 ns           10 bytes_per_second=5.12884G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:100/iterations:10        189879 ns       189498 ns           10 bytes_per_second=25.1632G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000/iterations:10       114942 ns       114825 ns           10 bytes_per_second=41.5272G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:10000/iterations:10      115618 ns       115571 ns           10 bytes_per_second=41.2593G/s
ColumnReaderReadBatchInt32/Repetition:0/BatchSize:100000/iterations:10     116905 ns       116825 ns           10 bytes_per_second=40.8163G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:100/iterations:10       2014855 ns      2014633 ns           10 bytes_per_second=1.26012G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000/iterations:10       709143 ns       708919 ns           10 bytes_per_second=3.58105G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:10000/iterations:10      568496 ns       568183 ns           10 bytes_per_second=4.46807G/s
ColumnReaderReadBatchInt32/Repetition:1/BatchSize:100000/iterations:10     547385 ns       547319 ns           10 bytes_per_second=4.63839G/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:100/iterations:10       3282103 ns      3281312 ns           10 bytes_per_second=840.013M/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000/iterations:10      1131684 ns      1130339 ns           10 bytes_per_second=2.38136G/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:10000/iterations:10      885073 ns       885070 ns           10 bytes_per_second=3.04128G/s
ColumnReaderReadBatchInt32/Repetition:2/BatchSize:100000/iterations:10     868194 ns       867931 ns           10 bytes_per_second=3.10133G/s

Nov 10 '22 17:11 pitrou

@pitrou the changes look fine. Thanks!

@wgtmac regarding your question about benchmarking other types, that is a good point. I am hoping that this benchmark is a starting point, and we can gradually add to it.

Nov 11 '22 22:11 fatemehp

@fatemehp Is it expected that skipping is slower in most of these micro-benchmarks?

Nov 14 '22 13:11 pitrou

In general, skip should be on-par with read or faster. This pull request should improve the Skip numbers: https://github.com/apache/arrow/pull/14509

Also, note that only for the case of skipping whole pages we would see the real benefit of skipping. Otherwise, we are reading and throwing away values.

Nov 14 '22 18:11 fatemehp

Benchmark runs are scheduled for baseline = a068096d9c561b5bffff7211f8d5288a728c3a9a and contender = 3b852e49fec85b57545c6edc6c66d3da93de2c06. 3b852e49fec85b57545c6edc6c66d3da93de2c06 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2 [Finished :arrow_down:0.34% :arrow_up:0.03%] test-mac-arm [Finished :arrow_down:0.27% :arrow_up:0.0%] ursa-i9-9960x [Finished :arrow_down:1.26% :arrow_up:0.0%] ursa-thinkcentre-m75q Buildkite builds: [Finished] 3b852e49 ec2-t3-xlarge-us-east-2 [Finished] 3b852e49 test-mac-arm [Finished] 3b852e49 ursa-i9-9960x [Finished] 3b852e49 ursa-thinkcentre-m75q [Finished] a068096d ec2-t3-xlarge-us-east-2 [Finished] a068096d test-mac-arm [Finished] a068096d ursa-i9-9960x [Finished] a068096d ursa-thinkcentre-m75q Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Nov 16 '22 00:11 ursabot