GH-41702: [C++][Parquet] Thrift: generate template method to accelerate reading thrift
Rationale for this change
Thrift deserializer calls many virtual functions. This generate a template method for it.
What changes are included in this PR?
cpp/build-support/update-thrift.sh: addtemplatesas generate argumentcpp/src/parquet/thrift_internal.h: using generated code
Are these changes tested?
Covered by existing code
Are there any user-facing changes?
no
- GitHub Issue: #41702
:warning: GitHub issue #41702 has been automatically assigned in GitHub to PR creator.
@emkornfield @pitrou I've update a patching here. This generated call less virtual functions during deserializing. Would you mind take a look?
I'm not so familiar with thrift compiler, maybe more useful tools can help deserializing
@mapleFU I didn't know this was possible. This looks neat in the principle. Did you try to run some benchmark?
Run in page index: https://github.com/apache/arrow/issues/41702#issuecomment-2116873657
For footer it's more useful since readVirt is called for more times
I remember there was about 3% speedup reading a sample parquet file.
Perhaps you can try with the additional benchmarks in https://github.com/apache/arrow/pull/41761
On my M1 Pro with Release(O2):
After:
Run on (10 X 24.0711 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x10)
Load Average: 7.98, 10.79, 8.83
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
WriteMetadata/num_columns:1/num_row_groups:1 10248 ns 10198 ns 65596 file_size=459 items_per_second=98.0618k/s
WriteMetadata/num_columns:1/num_row_groups:100 708873 ns 701642 ns 1003 file_size=37.383k items_per_second=1.42523k/s
WriteMetadata/num_columns:1/num_row_groups:1000 7027939 ns 7022677 ns 99 file_size=374.885k items_per_second=142.396/s
WriteMetadata/num_columns:10/num_row_groups:1 78750 ns 78709 ns 8900 file_size=3.762k items_per_second=12.705k/s
WriteMetadata/num_columns:10/num_row_groups:100 6751510 ns 6644838 ns 105 file_size=358.835k items_per_second=150.493/s
WriteMetadata/num_columns:10/num_row_groups:1000 67659713 ns 67142800 ns 10 file_size=3.614M items_per_second=14.8936/s
WriteMetadata/num_columns:100/num_row_groups:1 787280 ns 771871 ns 910 file_size=37.352k items_per_second=1.29555k/s
WriteMetadata/num_columns:100/num_row_groups:100 66632500 ns 66540000 ns 10 file_size=3.61693M items_per_second=15.0286/s
WriteMetadata/num_columns:100/num_row_groups:1000 703455917 ns 699385000 ns 1 file_size=36.2887M items_per_second=1.42983/s
WriteMetadata/num_columns:1000/num_row_groups:1 8089713 ns 8087153 ns 85 file_size=376.655k items_per_second=123.653/s
WriteMetadata/num_columns:1000/num_row_groups:100 705972459 ns 702311000 ns 1 file_size=36.4815M items_per_second=1.42387/s
WriteMetadata/num_columns:10000/num_row_groups:1 82793505 ns 82773750 ns 8 file_size=3.82213M items_per_second=12.0811/s
WriteMetadata/num_columns:10000/num_row_groups:100 7789295000 ns 7492551000 ns 1 file_size=369.089M items_per_second=0.133466/s
ReadMetadata/num_columns:1/num_row_groups:1 3022 ns 3021 ns 229889 file_size=459 items_per_second=330.982k/s
ReadMetadata/num_columns:1/num_row_groups:100 59165 ns 59139 ns 11742 file_size=37.383k items_per_second=16.9092k/s
ReadMetadata/num_columns:1/num_row_groups:1000 587111 ns 586972 ns 1189 file_size=374.885k items_per_second=1.70366k/s
ReadMetadata/num_columns:10/num_row_groups:1 13977 ns 13973 ns 50402 file_size=3.762k items_per_second=71.569k/s
ReadMetadata/num_columns:10/num_row_groups:100 475674 ns 475562 ns 1469 file_size=358.835k items_per_second=2.10278k/s
ReadMetadata/num_columns:10/num_row_groups:1000 4743075 ns 4742237 ns 139 file_size=3.614M items_per_second=210.871/s
ReadMetadata/num_columns:100/num_row_groups:1 119355 ns 119308 ns 5747 file_size=37.352k items_per_second=8.38169k/s
ReadMetadata/num_columns:100/num_row_groups:100 5379931 ns 5378835 ns 133 file_size=3.61693M items_per_second=185.914/s
ReadMetadata/num_columns:100/num_row_groups:1000 58173311 ns 58151000 ns 13 file_size=36.2887M items_per_second=17.1966/s
ReadMetadata/num_columns:1000/num_row_groups:1 1285306 ns 1284195 ns 514 file_size=376.655k items_per_second=778.698/s
ReadMetadata/num_columns:1000/num_row_groups:100 59154014 ns 59110667 ns 12 file_size=36.4815M items_per_second=16.9174/s
ReadMetadata/num_columns:10000/num_row_groups:1 15298734 ns 15288065 ns 46 file_size=3.82213M items_per_second=65.4105/s
ReadMetadata/num_columns:10000/num_row_groups:100 597222875 ns 594531000 ns 1 file_size=369.089M items_per_second=1.682/s
Before:
WriteMetadata/num_columns:1/num_row_groups:1 13997 ns 10952 ns 64411 file_size=459 items_per_second=91.3074k/s
WriteMetadata/num_columns:1/num_row_groups:100 1161928 ns 781421 ns 915 file_size=37.383k items_per_second=1.27972k/s
WriteMetadata/num_columns:1/num_row_groups:1000 9028193 ns 7580868 ns 91 file_size=374.885k items_per_second=131.911/s
WriteMetadata/num_columns:10/num_row_groups:1 87804 ns 81408 ns 8680 file_size=3.762k items_per_second=12.2838k/s
WriteMetadata/num_columns:10/num_row_groups:100 7922727 ns 7032396 ns 96 file_size=358.835k items_per_second=142.199/s
WriteMetadata/num_columns:10/num_row_groups:1000 83557727 ns 72335889 ns 9 file_size=3.614M items_per_second=13.8244/s
WriteMetadata/num_columns:100/num_row_groups:1 1046771 ns 866386 ns 813 file_size=37.352k items_per_second=1.15422k/s
WriteMetadata/num_columns:100/num_row_groups:100 97720995 ns 74290111 ns 9 file_size=3.61693M items_per_second=13.4607/s
WriteMetadata/num_columns:100/num_row_groups:1000 1042585917 ns 773579000 ns 1 file_size=36.2887M items_per_second=1.29269/s
WriteMetadata/num_columns:1000/num_row_groups:1 9320268 ns 8396910 ns 78 file_size=376.655k items_per_second=119.091/s
WriteMetadata/num_columns:1000/num_row_groups:100 789198500 ns 726929000 ns 1 file_size=36.4815M items_per_second=1.37565/s
WriteMetadata/num_columns:10000/num_row_groups:1 105553526 ns 89228125 ns 8 file_size=3.82213M items_per_second=11.2072/s
WriteMetadata/num_columns:10000/num_row_groups:100 9705208125 ns 7941607000 ns 1 file_size=369.089M items_per_second=0.125919/s
ReadMetadata/num_columns:1/num_row_groups:1 3341 ns 3262 ns 215501 file_size=459 items_per_second=306.531k/s
ReadMetadata/num_columns:1/num_row_groups:100 70801 ns 67469 ns 10226 file_size=37.383k items_per_second=14.8215k/s
ReadMetadata/num_columns:1/num_row_groups:1000 697046 ns 661042 ns 1033 file_size=374.885k items_per_second=1.51276k/s
ReadMetadata/num_columns:10/num_row_groups:1 19616 ns 15182 ns 46741 file_size=3.762k items_per_second=65.866k/s
ReadMetadata/num_columns:10/num_row_groups:100 631976 ns 538377 ns 1240 file_size=358.835k items_per_second=1.85743k/s
ReadMetadata/num_columns:10/num_row_groups:1000 5701558 ns 5375484 ns 122 file_size=3.614M items_per_second=186.03/s
ReadMetadata/num_columns:100/num_row_groups:1 137789 ns 128750 ns 5466 file_size=37.352k items_per_second=7.76702k/s
ReadMetadata/num_columns:100/num_row_groups:100 6475114 ns 6090483 ns 118 file_size=3.61693M items_per_second=164.191/s
ReadMetadata/num_columns:100/num_row_groups:1000 64411345 ns 62630000 ns 11 file_size=36.2887M items_per_second=15.9668/s
ReadMetadata/num_columns:1000/num_row_groups:1 1473490 ns 1402757 ns 453 file_size=376.655k items_per_second=712.882/s
ReadMetadata/num_columns:1000/num_row_groups:100 66037220 ns 64025909 ns 11 file_size=36.4815M items_per_second=15.6187/s
ReadMetadata/num_columns:10000/num_row_groups:1 18425749 ns 16564045 ns 44 file_size=3.82213M items_per_second=60.3717/s
ReadMetadata/num_columns:10000/num_row_groups:100 650862958 ns 636789000 ns 1 file_size=369.089M items_per_second=1.57038/s
On my AMD 3800X:
Before:
WriteMetadata/num_columns:1/num_row_groups:1 14869 ns 14869 ns 42700 file_size=459 items_per_second=67.2552k/s
WriteMetadata/num_columns:1/num_row_groups:100 1026862 ns 1026848 ns 689 file_size=37.383k items_per_second=973.854/s
WriteMetadata/num_columns:1/num_row_groups:1000 9657576 ns 9656124 ns 72 file_size=374.885k items_per_second=103.561/s
WriteMetadata/num_columns:10/num_row_groups:1 121405 ns 121406 ns 5869 file_size=3.762k items_per_second=8.23686k/s
WriteMetadata/num_columns:10/num_row_groups:100 9488113 ns 9488130 ns 73 file_size=358.835k items_per_second=105.395/s
WriteMetadata/num_columns:10/num_row_groups:1000 98853564 ns 98852700 ns 7 file_size=3.614M items_per_second=10.1161/s
WriteMetadata/num_columns:100/num_row_groups:1 1142870 ns 1142808 ns 629 file_size=37.352k items_per_second=875.037/s
WriteMetadata/num_columns:100/num_row_groups:100 96569070 ns 96568757 ns 7 file_size=3.61693M items_per_second=10.3553/s
WriteMetadata/num_columns:100/num_row_groups:1000 1017437093 ns 1017435400 ns 1 file_size=36.2887M items_per_second=0.982863/s
WriteMetadata/num_columns:1000/num_row_groups:1 11040304 ns 11040197 ns 65 file_size=376.655k items_per_second=90.5781/s
WriteMetadata/num_columns:1000/num_row_groups:100 995932342 ns 995929600 ns 1 file_size=36.4815M items_per_second=1.00409/s
WriteMetadata/num_columns:10000/num_row_groups:1 114961261 ns 114961450 ns 6 file_size=3.82213M items_per_second=8.69857/s
WriteMetadata/num_columns:10000/num_row_groups:100 1.6961e+10 ns 1.6960e+10 ns 1 file_size=369.089M items_per_second=0.0589634/s
ReadMetadata/num_columns:1/num_row_groups:1 6150 ns 6150 ns 95609 file_size=459 items_per_second=162.615k/s
ReadMetadata/num_columns:1/num_row_groups:100 148555 ns 148554 ns 5156 file_size=37.383k items_per_second=6.73154k/s
ReadMetadata/num_columns:1/num_row_groups:1000 1383664 ns 1383603 ns 549 file_size=374.885k items_per_second=722.751/s
ReadMetadata/num_columns:10/num_row_groups:1 31549 ns 31548 ns 16761 file_size=3.762k items_per_second=31.6973k/s
ReadMetadata/num_columns:10/num_row_groups:100 1329978 ns 1329950 ns 486 file_size=358.835k items_per_second=751.908/s
ReadMetadata/num_columns:10/num_row_groups:1000 15798009 ns 15797961 ns 44 file_size=3.614M items_per_second=63.2993/s
ReadMetadata/num_columns:100/num_row_groups:1 297319 ns 297316 ns 2119 file_size=37.352k items_per_second=3.36343k/s
ReadMetadata/num_columns:100/num_row_groups:100 13742747 ns 13742598 ns 49 file_size=3.61693M items_per_second=72.7664/s
ReadMetadata/num_columns:100/num_row_groups:1000 130178737 ns 130176500 ns 5 file_size=36.2887M items_per_second=7.68188/s
ReadMetadata/num_columns:1000/num_row_groups:1 2862534 ns 2862405 ns 260 file_size=376.655k items_per_second=349.357/s
ReadMetadata/num_columns:1000/num_row_groups:100 79884243 ns 79869014 ns 7 file_size=36.4815M items_per_second=12.5205/s
ReadMetadata/num_columns:10000/num_row_groups:1 18818536 ns 18818281 ns 37 file_size=3.82213M items_per_second=53.1398/s
ReadMetadata/num_columns:10000/num_row_groups:100 788936700 ns 788847500 ns 1 file_size=369.089M items_per_second=1.26767/s
After:
WriteMetadata/num_columns:1/num_row_groups:1 14042 ns 14026 ns 48265 file_size=459 items_per_second=71.2951k/s
WriteMetadata/num_columns:1/num_row_groups:100 982543 ns 982545 ns 693 file_size=37.383k items_per_second=1.01776k/s
WriteMetadata/num_columns:1/num_row_groups:1000 9236559 ns 9234951 ns 75 file_size=374.885k items_per_second=108.284/s
WriteMetadata/num_columns:10/num_row_groups:1 115867 ns 115865 ns 6050 file_size=3.762k items_per_second=8.63075k/s
WriteMetadata/num_columns:10/num_row_groups:100 9106303 ns 9106322 ns 77 file_size=358.835k items_per_second=109.814/s
WriteMetadata/num_columns:10/num_row_groups:1000 95039480 ns 95039886 ns 7 file_size=3.614M items_per_second=10.5219/s
WriteMetadata/num_columns:100/num_row_groups:1 1066471 ns 1066474 ns 648 file_size=37.352k items_per_second=937.67/s
WriteMetadata/num_columns:100/num_row_groups:100 92350381 ns 92350900 ns 8 file_size=3.61693M items_per_second=10.8283/s
WriteMetadata/num_columns:100/num_row_groups:1000 972198408 ns 971689600 ns 1 file_size=36.2887M items_per_second=1.02914/s
WriteMetadata/num_columns:1000/num_row_groups:1 10303438 ns 10302799 ns 68 file_size=376.655k items_per_second=97.061/s
WriteMetadata/num_columns:1000/num_row_groups:100 926151272 ns 926026200 ns 1 file_size=36.4815M items_per_second=1.07988/s
WriteMetadata/num_columns:10000/num_row_groups:1 109520337 ns 109283500 ns 6 file_size=3.82213M items_per_second=9.15051/s
WriteMetadata/num_columns:10000/num_row_groups:100 9607536338 ns 9603598900 ns 1 file_size=369.089M items_per_second=0.104128/s
ReadMetadata/num_columns:1/num_row_groups:1 3776 ns 3737 ns 190309 file_size=459 items_per_second=267.588k/s
ReadMetadata/num_columns:1/num_row_groups:100 76296 ns 76114 ns 9217 file_size=37.383k items_per_second=13.1382k/s
ReadMetadata/num_columns:1/num_row_groups:1000 706469 ns 706463 ns 993 file_size=374.885k items_per_second=1.4155k/s
ReadMetadata/num_columns:10/num_row_groups:1 18738 ns 18738 ns 35672 file_size=3.762k items_per_second=53.3679k/s
ReadMetadata/num_columns:10/num_row_groups:100 590179 ns 590180 ns 1202 file_size=358.835k items_per_second=1.6944k/s
ReadMetadata/num_columns:10/num_row_groups:1000 5821858 ns 5821727 ns 123 file_size=3.614M items_per_second=171.77/s
ReadMetadata/num_columns:100/num_row_groups:1 168284 ns 168284 ns 4074 file_size=37.352k items_per_second=5.94234k/s
ReadMetadata/num_columns:100/num_row_groups:100 5752814 ns 5752800 ns 118 file_size=3.61693M items_per_second=173.828/s
ReadMetadata/num_columns:100/num_row_groups:1000 65674677 ns 65672427 ns 11 file_size=36.2887M items_per_second=15.2271/s
ReadMetadata/num_columns:1000/num_row_groups:1 1574680 ns 1574646 ns 444 file_size=376.655k items_per_second=635.063/s
ReadMetadata/num_columns:1000/num_row_groups:100 65989678 ns 65988873 ns 11 file_size=36.4815M items_per_second=15.1541/s
ReadMetadata/num_columns:10000/num_row_groups:1 16967274 ns 16966876 ns 41 file_size=3.82213M items_per_second=58.9384/s
ReadMetadata/num_columns:10000/num_row_groups:100 652885946 ns 652766800 ns 1 file_size=369.089M items_per_second=1.53194/s
@github-actions crossbow submit -g cpp -g wheel
Revision: fde772cb97459f202a1ec3571bc7064a5de72d65
Submitted crossbow builds: ursacomputing/crossbow @ actions-bacf49dea9
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9ba9253e8527a7f3e2c6e47e631e278b8ca84e53.
There were 5 benchmark results indicating a performance regression:
- Commit Run on
ursa-i9-9960xat 2024-05-22 19:20:09Z - and 3 more (see the report linked below)
The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.