arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-40357: [C++] Add benchmark for ToTensor conversions

Open AlenkaF opened this issue 1 year ago • 3 comments

Rationale for this change

We should add benchmarks to be sure not to cause regressions while working on additional implementations of RecordBatch::ToTensor and Table::ToTensor.

What changes are included in this PR?

New cpp/src/arrow/to_tensor_benchmark.cc file.

  • GitHub Issue: #40357

AlenkaF avatar Mar 05 '24 09:03 AlenkaF

Can you show the result of running them? And we might want to use some more data to get a more reliable result?

jorisvandenbossche avatar Mar 05 '24 16:03 jorisvandenbossche

This was the result output:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-ahcnq1ah/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 17.32, 18.72, 16.18
----------------------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------
RecordBatchUniformTypesSimple        624 ns          624 ns      1125492 bytes_per_second=1.29039Gi/s items_per_second=43.2982M/s

WIll use RandomArrayGenerator to generate more data and add the result here.

AlenkaF avatar Mar 06 '24 08:03 AlenkaF

The result from running archery benchmark diff --benchmark-filter=BatchToTensorSimple on the second commit (but with arrays of length 100, not 500):

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-jun4cokj/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 24.95, 25.04, 19.14
---------------------------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>            550 ns          550 ns      1254345 bytes_per_second=4.06699Gi/s items_per_second=545.863M/s
BatchToTensorSimple<UInt16Type>           555 ns          553 ns      1235570 bytes_per_second=8.08251Gi/s items_per_second=542.408M/s
BatchToTensorSimple<UInt32Type>           569 ns          568 ns      1253335 bytes_per_second=15.7341Gi/s items_per_second=527.949M/s
BatchToTensorSimple<UInt64Type>           580 ns          580 ns      1237449 bytes_per_second=30.8253Gi/s items_per_second=517.163M/s
BatchToTensorSimple<Int8Type>             548 ns          548 ns      1249732 bytes_per_second=4.07944Gi/s items_per_second=547.533M/s
BatchToTensorSimple<Int16Type>            623 ns          568 ns      1233654 bytes_per_second=7.87246Gi/s items_per_second=528.312M/s
BatchToTensorSimple<Int32Type>            565 ns          564 ns      1204923 bytes_per_second=15.8461Gi/s items_per_second=531.706M/s
BatchToTensorSimple<Int64Type>            585 ns          585 ns      1269059 bytes_per_second=30.5699Gi/s items_per_second=512.878M/s
BatchToTensorSimple<HalfFloatType>        545 ns          544 ns      1217900 bytes_per_second=8.21219Gi/s items_per_second=551.111M/s
BatchToTensorSimple<FloatType>            575 ns          574 ns      1239991 bytes_per_second=15.5835Gi/s items_per_second=522.896M/s
BatchToTensorSimple<DoubleType>           567 ns          566 ns      1152074 bytes_per_second=31.5943Gi/s items_per_second=530.065M/s

AlenkaF avatar Mar 06 '24 10:03 AlenkaF

Current output when running archery benchmark diff --benchmark-filter=BatchToTensorSimple:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-e8lvkw1g/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 27.50, 28.87, 23.74
-----------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536/10000             4121 us         4107 us          171 bytes_per_second=15.217Mi/s items_per_second=12.765G/s null_percent=0.01 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/100               4273 us         4219 us          170 bytes_per_second=14.8143Mi/s items_per_second=12.4271G/s null_percent=1 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/10                4019 us         4003 us          173 bytes_per_second=15.6149Mi/s items_per_second=13.0988G/s null_percent=10 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/2                 4100 us         4083 us          136 bytes_per_second=15.3084Mi/s items_per_second=12.8416G/s null_percent=50 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/1                 3972 us         3894 us          178 bytes_per_second=16.0516Mi/s items_per_second=13.465G/s null_percent=100 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/0                 3953 us         3927 us          178 bytes_per_second=15.9142Mi/s items_per_second=13.3498G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304/10000       15398661 us      1947088 us            1 bytes_per_second=2.05435Mi/s items_per_second=1.72331G/s null_percent=0.01 size=4.1943M
.
.
.

AlenkaF avatar Mar 13 '24 13:03 AlenkaF

Output from running the benchmarks on the latest commit:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-y9o8zv4d/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 20.67, 17.39, 10.95
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536           443099 ns       442863 ns         1580 bytes_per_second=141.127Mi/s items_per_second=14.7983G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304       38391076 ns     35795222 ns           18 bytes_per_second=111.747Mi/s items_per_second=11.7175G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt16Type>/65536          882040 ns       881129 ns          747 bytes_per_second=70.9318Mi/s items_per_second=7.43773G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt16Type>/4194304     118462838 ns     81059222 ns            9 bytes_per_second=49.3466Mi/s items_per_second=5.17437G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt32Type>/65536         1937139 ns      1933673 ns          361 bytes_per_second=32.3219Mi/s items_per_second=3.3892G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt32Type>/4194304    1271556625 ns    651396000 ns            1 bytes_per_second=6.14066Mi/s items_per_second=643.895M/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt64Type>/65536         4440503 ns      4344614 ns          166 bytes_per_second=14.3856Mi/s items_per_second=1.50844G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt64Type>/4194304    1.1486e+10 ns   1742537000 ns            1 bytes_per_second=2.2955Mi/s items_per_second=240.701M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int8Type>/65536            415187 ns       410957 ns         1710 bytes_per_second=152.084Mi/s items_per_second=15.9472G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int8Type>/4194304        34241740 ns     33962150 ns           20 bytes_per_second=117.778Mi/s items_per_second=12.3499G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int16Type>/65536           812298 ns       810349 ns          917 bytes_per_second=77.1273Mi/s items_per_second=8.08738G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int16Type>/4194304       75301182 ns     70352375 ns            8 bytes_per_second=56.8566Mi/s items_per_second=5.96185G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int32Type>/65536          2033466 ns      2026663 ns          329 bytes_per_second=30.8389Mi/s items_per_second=3.23369G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int32Type>/4194304     1233238541 ns    562396000 ns            1 bytes_per_second=7.11243Mi/s items_per_second=745.792M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int64Type>/65536          3969188 ns      3959770 ns          178 bytes_per_second=15.7837Mi/s items_per_second=1.65505G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int64Type>/4194304     1.5188e+10 ns   1823171000 ns            1 bytes_per_second=2.19398Mi/s items_per_second=230.055M/s null_percent=0 size=4.1943M
BatchToTensorSimple<HalfFloatType>/65536       899771 ns       888509 ns          749 bytes_per_second=70.3426Mi/s items_per_second=7.37595G/s null_percent=0 size=65.536k
BatchToTensorSimple<HalfFloatType>/4194304   71104797 ns     69327375 ns            8 bytes_per_second=57.6973Mi/s items_per_second=6.05G/s null_percent=0 size=4.1943M
BatchToTensorSimple<FloatType>/65536          2025175 ns      2021084 ns          347 bytes_per_second=30.924Mi/s items_per_second=3.24262G/s null_percent=0 size=65.536k
BatchToTensorSimple<FloatType>/4194304     1087905188 ns    395840500 ns            2 bytes_per_second=10.1051Mi/s items_per_second=1.05959G/s null_percent=0 size=4.1943M
BatchToTensorSimple<DoubleType>/65536         4118269 ns      4089947 ns          170 bytes_per_second=15.2814Mi/s items_per_second=1.60237G/s null_percent=0 size=65.536k
BatchToTensorSimple<DoubleType>/4194304    9901101750 ns   1684713000 ns            1 bytes_per_second=2.37429Mi/s items_per_second=248.963M/s null_percent=0 size=4.1943M

AlenkaF avatar Mar 14 '24 14:03 AlenkaF

Looking better after your last suggestions Joris 🎉

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-w9c6kiee/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 16.45, 17.62, 11.63
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536           458536 ns       458333 ns         1504 bytes_per_second=13.3168Gi/s items_per_second=14.2988G/s
BatchToTensorSimple<UInt8Type>/4194304       51650596 ns     40859385 ns           13 bytes_per_second=9.56023Gi/s items_per_second=10.2652G/s
BatchToTensorSimple<UInt16Type>/65536          443072 ns       441767 ns         1327 bytes_per_second=13.8161Gi/s items_per_second=7.41748G/s
BatchToTensorSimple<UInt16Type>/4194304      37997653 ns     36128500 ns           18 bytes_per_second=10.8121Gi/s items_per_second=5.8047G/s
BatchToTensorSimple<UInt32Type>/65536          556504 ns       525753 ns         1625 bytes_per_second=11.6091Gi/s items_per_second=3.11629G/s
BatchToTensorSimple<UInt32Type>/4194304      47726554 ns     38831059 ns           17 bytes_per_second=10.0596Gi/s items_per_second=2.70035G/s
BatchToTensorSimple<UInt64Type>/65536          543929 ns       510296 ns         1071 bytes_per_second=11.9607Gi/s items_per_second=1.60534G/s
BatchToTensorSimple<UInt64Type>/4194304     104417887 ns     55937176 ns           17 bytes_per_second=6.98328Gi/s items_per_second=937.28M/s
BatchToTensorSimple<Int8Type>/65536            542291 ns       530495 ns         1000 bytes_per_second=11.5053Gi/s items_per_second=12.3537G/s
BatchToTensorSimple<Int8Type>/4194304        55069580 ns     44818231 ns           13 bytes_per_second=8.71576Gi/s items_per_second=9.35848G/s
BatchToTensorSimple<Int16Type>/65536           472947 ns       466738 ns         1604 bytes_per_second=13.077Gi/s items_per_second=7.02065G/s
BatchToTensorSimple<Int16Type>/4194304       45937775 ns     40318200 ns           15 bytes_per_second=9.68855Gi/s items_per_second=5.2015G/s
BatchToTensorSimple<Int32Type>/65536           439955 ns       438705 ns         1351 bytes_per_second=13.9126Gi/s items_per_second=3.73463G/s
BatchToTensorSimple<Int32Type>/4194304       38181667 ns     36099833 ns           18 bytes_per_second=10.8207Gi/s items_per_second=2.90466G/s
BatchToTensorSimple<Int64Type>/65536           440425 ns       439585 ns         1583 bytes_per_second=13.8847Gi/s items_per_second=1.86358G/s
BatchToTensorSimple<Int64Type>/4194304       51548936 ns     39940333 ns           15 bytes_per_second=9.78021Gi/s items_per_second=1.31268G/s
BatchToTensorSimple<HalfFloatType>/65536       435417 ns       434107 ns         1526 bytes_per_second=14.0599Gi/s items_per_second=7.54836G/s
BatchToTensorSimple<HalfFloatType>/4194304   48649122 ns     38652385 ns           13 bytes_per_second=10.1061Gi/s items_per_second=5.42567G/s
BatchToTensorSimple<FloatType>/65536           432115 ns       430647 ns         1522 bytes_per_second=14.1729Gi/s items_per_second=3.80451G/s
BatchToTensorSimple<FloatType>/4194304       42923344 ns     38628000 ns           16 bytes_per_second=10.1125Gi/s items_per_second=2.71455G/s
BatchToTensorSimple<DoubleType>/65536          442113 ns       441402 ns         1304 bytes_per_second=13.8276Gi/s items_per_second=1.85591G/s
BatchToTensorSimple<DoubleType>/4194304      60867021 ns     44292875 ns           16 bytes_per_second=8.81914Gi/s items_per_second=1.18368G/s

AlenkaF avatar Mar 21 '24 13:03 AlenkaF

Thanks for this @AlenkaF . I have two general suggestions here:

  1. given that the types are purely physical here (i.e. float32 should use the same conversion code as int32 and uint32), we don't need to benchmark all numeric data types, we can limit ourselves to four integer types: int8, int16, int32, int64
  2. on the other hand, it would be nice to exercise different numbers of columns, because that could affect conversion performance: for example 3, 30, 300?

Does it make sense @AlenkaF @jorisvandenbossche ?

pitrou avatar Mar 21 '24 13:03 pitrou

It does! Will update 👍

AlenkaF avatar Mar 21 '24 14:03 AlenkaF

@pitrou I have included your suggestions. This is the output with the latest changes:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-s2l7kna2/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 19.25, 15.77, 10.23
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<Int8Type>/65536/3            3803 ns         3802 ns       179651 bytes_per_second=48.1626Gi/s items_per_second=51.7142G/s
BatchToTensorSimple<Int8Type>/65536/30         141770 ns       140390 ns         5332 bytes_per_second=13.0426Gi/s items_per_second=14.0044G/s
BatchToTensorSimple<Int8Type>/65536/300       1509271 ns      1488588 ns          471 bytes_per_second=12.3006Gi/s items_per_second=13.2077G/s
BatchToTensorSimple<Int8Type>/4194304/3        892806 ns       890394 ns          792 bytes_per_second=13.1613Gi/s items_per_second=14.1318G/s
BatchToTensorSimple<Int8Type>/4194304/30     10833571 ns     10319294 ns           68 bytes_per_second=11.3562Gi/s items_per_second=12.1936G/s
BatchToTensorSimple<Int8Type>/4194304/300  1128155000 ns    551951000 ns            1 bytes_per_second=2.12315Gi/s items_per_second=2.27972G/s
BatchToTensorSimple<Int16Type>/65536/3           3781 ns         3769 ns       185929 bytes_per_second=48.5878Gi/s items_per_second=26.0854G/s
BatchToTensorSimple<Int16Type>/65536/30        129794 ns       129615 ns         5636 bytes_per_second=14.1269Gi/s items_per_second=7.58431G/s
BatchToTensorSimple<Int16Type>/65536/300      1553976 ns      1550687 ns          435 bytes_per_second=11.808Gi/s items_per_second=6.33938G/s
BatchToTensorSimple<Int16Type>/4194304/3       824934 ns       822791 ns          882 bytes_per_second=14.2427Gi/s items_per_second=7.64648G/s
BatchToTensorSimple<Int16Type>/4194304/30     9991414 ns      9954623 ns           69 bytes_per_second=11.7722Gi/s items_per_second=6.32013G/s
BatchToTensorSimple<Int16Type>/4194304/300  791524063 ns    310795500 ns            2 bytes_per_second=3.77057Gi/s items_per_second=2.02431G/s
BatchToTensorSimple<Int32Type>/65536/3           3717 ns         3712 ns       183626 bytes_per_second=49.3228Gi/s items_per_second=13.24G/s
BatchToTensorSimple<Int32Type>/65536/30        135493 ns       134325 ns         5035 bytes_per_second=13.6315Gi/s items_per_second=3.65918G/s
BatchToTensorSimple<Int32Type>/65536/300      1607824 ns      1600713 ns          436 bytes_per_second=11.439Gi/s items_per_second=3.07063G/s
BatchToTensorSimple<Int32Type>/4194304/3       863068 ns       860123 ns          782 bytes_per_second=13.6245Gi/s items_per_second=3.6573G/s
BatchToTensorSimple<Int32Type>/4194304/30    10307080 ns     10272412 ns           68 bytes_per_second=11.408Gi/s items_per_second=3.06231G/s
BatchToTensorSimple<Int32Type>/4194304/300  261872267 ns    147986600 ns            5 bytes_per_second=7.91879Gi/s items_per_second=2.12568G/s
BatchToTensorSimple<Int64Type>/65536/3           3725 ns         3722 ns       183079 bytes_per_second=49.1992Gi/s items_per_second=6.60341G/s
BatchToTensorSimple<Int64Type>/65536/30        126616 ns       126444 ns         5720 bytes_per_second=14.4811Gi/s items_per_second=1.94362G/s
BatchToTensorSimple<Int64Type>/65536/300      1508292 ns      1506162 ns          445 bytes_per_second=12.1571Gi/s items_per_second=1.6317G/s
BatchToTensorSimple<Int64Type>/4194304/3       837330 ns       835840 ns          833 bytes_per_second=14.0203Gi/s items_per_second=1.88178G/s
BatchToTensorSimple<Int64Type>/4194304/30     9866716 ns      9823261 ns           69 bytes_per_second=11.9296Gi/s items_per_second=1.60116G/s
BatchToTensorSimple<Int64Type>/4194304/300  745086639 ns    325750333 ns            3 bytes_per_second=3.59746Gi/s items_per_second=482.843M/s

AlenkaF avatar Mar 22 '24 12:03 AlenkaF

Latest output:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-1wpfanyn/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 31.29, 25.96, 16.82
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<Int8Type>/65536/3            2441 ns         2440 ns       340594 bytes_per_second=25.0118Gi/s items_per_second=26.8563G/s
BatchToTensorSimple<Int8Type>/65536/30           3158 ns         3157 ns       219021 bytes_per_second=19.3277Gi/s items_per_second=20.753G/s
BatchToTensorSimple<Int8Type>/65536/300         13722 ns        13719 ns        50473 bytes_per_second=4.43975Gi/s items_per_second=4.76715G/s
BatchToTensorSimple<Int8Type>/4194304/3        277670 ns       277521 ns         2510 bytes_per_second=14.0755Gi/s items_per_second=15.1135G/s
BatchToTensorSimple<Int8Type>/4194304/30       293289 ns       293183 ns         2430 bytes_per_second=13.3236Gi/s items_per_second=14.3061G/s
BatchToTensorSimple<Int8Type>/4194304/300      298143 ns       297779 ns         2263 bytes_per_second=13.1179Gi/s items_per_second=14.0853G/s
BatchToTensorSimple<Int16Type>/65536/3           2181 ns         2179 ns       394604 bytes_per_second=28.0054Gi/s items_per_second=15.0353G/s
BatchToTensorSimple<Int16Type>/65536/30          3247 ns         3236 ns       220372 bytes_per_second=18.8588Gi/s items_per_second=10.1247G/s
BatchToTensorSimple<Int16Type>/65536/300        14148 ns        14137 ns        46621 bytes_per_second=4.30854Gi/s items_per_second=2.31313G/s
BatchToTensorSimple<Int16Type>/4194304/3       277347 ns       277092 ns         2553 bytes_per_second=14.0973Gi/s items_per_second=7.56842G/s
BatchToTensorSimple<Int16Type>/4194304/30      370514 ns       323043 ns         2535 bytes_per_second=12.092Gi/s items_per_second=6.49187G/s
BatchToTensorSimple<Int16Type>/4194304/300     297281 ns       296810 ns         2113 bytes_per_second=13.1598Gi/s items_per_second=7.06513G/s
BatchToTensorSimple<Int32Type>/65536/3           2349 ns         2346 ns       387584 bytes_per_second=26.0117Gi/s items_per_second=6.98246G/s
BatchToTensorSimple<Int32Type>/65536/30          3163 ns         3158 ns       213616 bytes_per_second=19.3208Gi/s items_per_second=5.18638G/s
BatchToTensorSimple<Int32Type>/65536/300        13852 ns        13840 ns        49582 bytes_per_second=4.3606Gi/s items_per_second=1.17054G/s
BatchToTensorSimple<Int32Type>/4194304/3       342283 ns       319630 ns         1969 bytes_per_second=12.2212Gi/s items_per_second=3.28059G/s
BatchToTensorSimple<Int32Type>/4194304/30      290756 ns       286728 ns         2381 bytes_per_second=13.6233Gi/s items_per_second=3.65699G/s
BatchToTensorSimple<Int32Type>/4194304/300     300295 ns       297110 ns         2360 bytes_per_second=13.1465Gi/s items_per_second=3.529G/s
BatchToTensorSimple<Int64Type>/65536/3           2204 ns         2197 ns       410967 bytes_per_second=27.7705Gi/s items_per_second=3.7273G/s
BatchToTensorSimple<Int64Type>/65536/30          3176 ns         3162 ns       216236 bytes_per_second=19.3002Gi/s items_per_second=2.59043G/s
BatchToTensorSimple<Int64Type>/65536/300        13656 ns        13588 ns        51372 bytes_per_second=4.4415Gi/s items_per_second=596.128M/s
BatchToTensorSimple<Int64Type>/4194304/3       270131 ns       268433 ns         2622 bytes_per_second=14.552Gi/s items_per_second=1.95313G/s
BatchToTensorSimple<Int64Type>/4194304/30      297324 ns       292629 ns         2026 bytes_per_second=13.3486Gi/s items_per_second=1.79162G/s
BatchToTensorSimple<Int64Type>/4194304/300     293260 ns       291513 ns         2290 bytes_per_second=13.3951Gi/s items_per_second=1.79786G/s

AlenkaF avatar Mar 24 '24 06:03 AlenkaF

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit fc87fd75d6602562e64abf8744890332e35f979e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.