arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

Open mapleFU opened this issue 1 year ago • 9 comments

Rationale for this change

This patch reuses the same buffer in CompressedInputStream. It includes the decompress_ and compress_ buffer

What changes are included in this PR?

  1. For compress_, allocate and reuse same buffer with kChunkSize (64KB), and reusing it
  2. For decompress_, reusing a same buffer (mostly 1MB) without continues Reallocate

In the worst case, decompress_ might hold a large buffer.

Are these changes tested?

Already

Are there any user-facing changes?

CompressedInputStream might has larger buffer

  • Closes: #39377

mapleFU avatar Jan 26 '24 14:01 mapleFU

:warning: GitHub issue #39377 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Jan 26 '24 14:01 github-actions[bot]

cc @pitrou @felipecrv

mapleFU avatar Jan 26 '24 14:01 mapleFU

have you run any benchmarks?

Currently not, let me find and run them

mapleFU avatar Jan 30 '24 14:01 mapleFU

I really need to work on these stream classes before I can fee confident reviewing these optimizations.

felipecrv avatar Feb 01 '24 14:02 felipecrv

Before optimize:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192            16717 ns        16318 ns        43287 bytes_per_second=73.5798M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536           96598 ns        94595 ns         6962 bytes_per_second=97.6409M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576       1592301 ns      1589814 ns          440 bytes_per_second=90.8238M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192         19860 ns        19794 ns        36185 bytes_per_second=60.6601M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536       103077 ns       101591 ns         7024 bytes_per_second=90.9171M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576    1680448 ns      1636030 ns          428 bytes_per_second=88.2581M/s

After:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192            18100 ns        17072 ns        40079 bytes_per_second=70.3304M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536           95859 ns        93712 ns         7438 bytes_per_second=98.5613M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576       1629148 ns      1608965 ns          428 bytes_per_second=89.7428M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192         20614 ns        20083 ns        33628 bytes_per_second=59.7863M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536       100852 ns        98297 ns         6913 bytes_per_second=93.9637M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576    1615719 ns      1608626 ns          433 bytes_per_second=89.7617M/s

Seems it even be slower in some cases, I'll try on it later

mapleFU avatar Feb 05 '24 17:02 mapleFU

You should benchmark using a faster codec such as LZ4, if you really want to measure the overhead of CompressedInputStream.

pitrou avatar Feb 05 '24 17:02 pitrou

(After changing supports_zero_copy_from_raw_ to const, my optimization would be a little faster. I'll dive into it tomorrow)

mapleFU avatar Feb 05 '24 17:02 mapleFU

Sorry for delaying, I'm suffering from to much work this two weeks. I'll enhance this on weekend

mapleFU avatar Mar 08 '24 16:03 mapleFU

It's ok @mapleFU !

pitrou avatar Mar 08 '24 17:03 pitrou

Under LLVM-17, MacOS M1 Pro, Release (-O2):

After:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                    14066 ns        14042 ns        50325 bytes_per_second=85.509M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                   81058 ns        80930 ns         8516 bytes_per_second=114.127M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536                  85914 ns        85865 ns         7871 bytes_per_second=107.568M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192               1383077 ns      1380249 ns          511 bytes_per_second=104.614M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536              1381771 ns      1379589 ns          504 bytes_per_second=104.664M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576            1449293 ns      1445271 ns          484 bytes_per_second=99.9072M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                 17520 ns        17086 ns        40610 bytes_per_second=70.2738M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                90540 ns        86818 ns         8047 bytes_per_second=106.387M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536               92686 ns        90816 ns         7614 bytes_per_second=101.704M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192            1416457 ns      1387700 ns          510 bytes_per_second=104.052M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536           1403328 ns      1397628 ns          505 bytes_per_second=103.313M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576         1469190 ns      1460000 ns          481 bytes_per_second=98.8993M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                    13032 ns        12953 ns        54602 bytes_per_second=91.2257M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                   57312 ns        57157 ns        12273 bytes_per_second=162.463M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536                  65626 ns        64273 ns        10869 bytes_per_second=144.477M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192                925974 ns       925072 ns          746 bytes_per_second=158.115M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536               961608 ns       959083 ns          750 bytes_per_second=152.508M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576            1029553 ns      1028537 ns          680 bytes_per_second=142.21M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                 20305 ns        17293 ns        46128 bytes_per_second=68.3272M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                60087 ns        59985 ns        11039 bytes_per_second=154.805M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536               74347 ns        69853 ns        10346 bytes_per_second=132.936M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192            1114509 ns       992978 ns          721 bytes_per_second=147.302M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536            960557 ns       959252 ns          710 bytes_per_second=152.481M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576         1042081 ns      1027100 ns          700 bytes_per_second=142.409M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                7225 ns         7220 ns        88111 bytes_per_second=300.086M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              25710 ns        23944 ns        30632 bytes_per_second=760.71M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             30453 ns        30158 ns        24247 bytes_per_second=603.954M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           398575 ns       396622 ns         1771 bytes_per_second=715.976M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          403529 ns       400651 ns         1783 bytes_per_second=708.777M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        467985 ns       463299 ns         1513 bytes_per_second=612.934M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             9285 ns         9243 ns        71174 bytes_per_second=234.419M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           29829 ns        28100 ns        25100 bytes_per_second=648.201M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          36288 ns        34810 ns        20527 bytes_per_second=523.245M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        402043 ns       398348 ns         1715 bytes_per_second=712.874M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       422659 ns       416023 ns         1623 bytes_per_second=682.587M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     498678 ns       489495 ns         1440 bytes_per_second=580.132M/s

Before:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                    14371 ns        14325 ns        46902 bytes_per_second=83.8181M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                   77623 ns        77507 ns         9161 bytes_per_second=119.168M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536                  87373 ns        87304 ns         8358 bytes_per_second=105.795M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192               1383045 ns      1382063 ns          504 bytes_per_second=104.476M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536              1370321 ns      1369469 ns          512 bytes_per_second=105.437M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576            1433147 ns      1432126 ns          493 bytes_per_second=100.824M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                 16448 ns        16447 ns        41847 bytes_per_second=73.0014M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                82950 ns        82436 ns         8123 bytes_per_second=112.042M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536               88021 ns        87920 ns         7805 bytes_per_second=105.053M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192            1384970 ns      1383970 ns          506 bytes_per_second=104.332M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536           1386657 ns      1385639 ns          509 bytes_per_second=104.207M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576         1444835 ns      1443245 ns          490 bytes_per_second=100.047M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                    13062 ns        12891 ns        50916 bytes_per_second=91.6604M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                   56024 ns        55910 ns        11993 bytes_per_second=166.088M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536                  65663 ns        64584 ns        11142 bytes_per_second=143.781M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192                963169 ns       941353 ns          734 bytes_per_second=155.381M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536              1048394 ns       972503 ns          733 bytes_per_second=150.403M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576            1031212 ns      1028287 ns          687 bytes_per_second=142.244M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                 16712 ns        16258 ns        43318 bytes_per_second=72.6775M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                60878 ns        60527 ns        11269 bytes_per_second=153.417M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536               68693 ns        67494 ns        10350 bytes_per_second=137.581M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192             950998 ns       946565 ns          722 bytes_per_second=154.525M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536            964300 ns       962337 ns          733 bytes_per_second=151.992M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576         1029719 ns      1028186 ns          665 bytes_per_second=142.258M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                7108 ns         7084 ns        92116 bytes_per_second=305.886M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              22935 ns        22908 ns        27823 bytes_per_second=795.094M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             30680 ns        30548 ns        24209 bytes_per_second=596.256M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           389600 ns       389246 ns         1755 bytes_per_second=729.544M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          395812 ns       395273 ns         1705 bytes_per_second=718.419M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        457402 ns       456781 ns         1518 bytes_per_second=621.68M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             9360 ns         9350 ns        76763 bytes_per_second=231.729M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           27402 ns        27259 ns        25881 bytes_per_second=668.181M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          32157 ns        32138 ns        20886 bytes_per_second=566.753M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        444164 ns       441893 ns         1583 bytes_per_second=642.625M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       446398 ns       446078 ns         1550 bytes_per_second=636.597M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     493388 ns       493005 ns         1300 bytes_per_second=576.001M/s

mapleFU avatar Mar 18 '24 03:03 mapleFU

After reducing calling to ResizableBuffer::Resize, current pr turns to:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                 17346 ns        14395 ns        47957 bytes_per_second=83.4107M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                74538 ns        74469 ns         8225 bytes_per_second=124.029M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536               83750 ns        82631 ns         8278 bytes_per_second=111.778M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192            1345548 ns      1337842 ns          530 bytes_per_second=107.93M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536           1358944 ns      1357852 ns          534 bytes_per_second=106.339M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576         1397426 ns      1391512 ns          508 bytes_per_second=103.767M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                 12235 ns        12200 ns        58011 bytes_per_second=96.8532M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                55298 ns        55255 ns        12700 bytes_per_second=168.055M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536               60370 ns        59764 ns        11419 bytes_per_second=155.377M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192             903461 ns       900298 ns          769 bytes_per_second=162.466M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536            950861 ns       946309 ns          744 bytes_per_second=154.567M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576          978464 ns       977109 ns          727 bytes_per_second=149.695M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             6457 ns         6451 ns       105751 bytes_per_second=335.891M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           21589 ns        21585 ns        31528 bytes_per_second=843.832M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          28415 ns        28346 ns        24531 bytes_per_second=642.569M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        382945 ns       382345 ns         1850 bytes_per_second=742.711M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       372741 ns       372615 ns         1829 bytes_per_second=762.106M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     425632 ns       425422 ns         1617 bytes_per_second=667.506M/s

Notice that decompressing a 1024 * 1024 compressed data turns faster, other wouldn't changed

mapleFU avatar Mar 18 '24 03:03 mapleFU

On my win wsl Ubuntu22, AMD 3800X with gcc11.4, Release (-O2):

After:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                    15113 ns        15153 ns        45795 bytes_per_second=79.4275Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                  121673 ns       121761 ns         5686 bytes_per_second=75.8248Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536                 122201 ns       122270 ns         5718 bytes_per_second=75.5095Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192               1918363 ns      1918733 ns          357 bytes_per_second=75.2846Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536              1925816 ns      1926215 ns          362 bytes_per_second=74.9922Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576            1949610 ns      1950057 ns          363 bytes_per_second=74.0753Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                 15292 ns        15331 ns        44844 bytes_per_second=78.5026Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192               122159 ns       122242 ns         5717 bytes_per_second=75.5267Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536              122144 ns       122221 ns         5687 bytes_per_second=75.5394Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192            1916707 ns      1917085 ns          361 bytes_per_second=75.3494Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536           1936225 ns      1936580 ns          356 bytes_per_second=74.5909Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576         1974165 ns      1974662 ns          354 bytes_per_second=73.1523Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                     7642 ns         7655 ns        91056 bytes_per_second=154.365Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                   39511 ns        39541 ns        17486 bytes_per_second=233.445Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536                  40372 ns        40405 ns        17402 bytes_per_second=228.45Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192                742301 ns       742596 ns          942 bytes_per_second=196.959Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536               741678 ns       741958 ns          946 bytes_per_second=197.129Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576             749815 ns       750119 ns          938 bytes_per_second=194.984Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                  8083 ns         8097 ns        86602 bytes_per_second=145.937Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                40381 ns        40418 ns        17284 bytes_per_second=228.379Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536               40977 ns        41010 ns        16777 bytes_per_second=225.084Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192             743131 ns       743474 ns          943 bytes_per_second=196.727Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536            753482 ns       753796 ns          920 bytes_per_second=194.033Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576          763397 ns       763723 ns          914 bytes_per_second=191.511Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                2627 ns         2658 ns       259151 bytes_per_second=801.929Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              14589 ns        14627 ns        47447 bytes_per_second=1.21655Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             15057 ns        15091 ns        47015 bytes_per_second=1.17918Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           292254 ns       292447 ns         2399 bytes_per_second=973.26Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          296420 ns       296544 ns         2361 bytes_per_second=959.815Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        299178 ns       299350 ns         2342 bytes_per_second=950.818Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             3133 ns         3165 ns       222458 bytes_per_second=673.399Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           15510 ns        15550 ns        44675 bytes_per_second=1.14436Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          15672 ns        15700 ns        44787 bytes_per_second=1.1334Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        296992 ns       297173 ns         2367 bytes_per_second=957.784Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       302407 ns       302596 ns         2294 bytes_per_second=940.617Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     304691 ns       304865 ns         2288 bytes_per_second=933.618Mi/s

Before:

CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                    15091 ns        15129 ns        44783 bytes_per_second=79.5491Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192                  124276 ns       124365 ns         5609 bytes_per_second=74.2374Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536                 125119 ns       125202 ns         5581 bytes_per_second=73.7413Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192               1967307 ns      1967803 ns          357 bytes_per_second=73.4073Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536              1966845 ns      1967298 ns          358 bytes_per_second=73.4262Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576            1969676 ns      1970107 ns          355 bytes_per_second=73.3215Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192                 15471 ns        15510 ns        44616 bytes_per_second=77.5999Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192               125884 ns       125971 ns         5556 bytes_per_second=73.2907Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536              123910 ns       123991 ns         5612 bytes_per_second=74.4612Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192            2012901 ns      2013295 ns          354 bytes_per_second=71.7486Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536           2019119 ns      2019770 ns          349 bytes_per_second=71.5186Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576         1979235 ns      1979722 ns          346 bytes_per_second=72.9654Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                     7864 ns         7883 ns        88901 bytes_per_second=149.899Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                   40161 ns        40195 ns        17269 bytes_per_second=229.646Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536                  40392 ns        40423 ns        17157 bytes_per_second=228.351Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192                762881 ns       763268 ns          942 bytes_per_second=191.625Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536               749413 ns       749734 ns          929 bytes_per_second=195.084Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576             763498 ns       763866 ns          918 bytes_per_second=191.475Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192                  8482 ns         8500 ns        81298 bytes_per_second=139.019Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192                40456 ns        40489 ns        16916 bytes_per_second=227.979Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536               41042 ns        41078 ns        16997 bytes_per_second=224.709Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192             741352 ns       741649 ns          930 bytes_per_second=197.211Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536            749918 ns       750239 ns          936 bytes_per_second=194.953Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576          758674 ns       758991 ns          930 bytes_per_second=192.705Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                2645 ns         2675 ns       261762 bytes_per_second=796.956Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              14682 ns        14715 ns        46999 bytes_per_second=1.20928Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             15221 ns        15248 ns        45742 bytes_per_second=1.16704Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           301442 ns       301624 ns         2331 bytes_per_second=943.649Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          307876 ns       308057 ns         2276 bytes_per_second=923.943Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        304182 ns       304369 ns         2301 bytes_per_second=935.136Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             3257 ns         3288 ns       211478 bytes_per_second=648.168Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           16088 ns        16128 ns        43704 bytes_per_second=1.10335Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          16290 ns        16319 ns        42819 bytes_per_second=1.09042Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        305555 ns       305746 ns         2311 bytes_per_second=930.925Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       308251 ns       308442 ns         2274 bytes_per_second=922.79Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     310809 ns       311000 ns         2257 bytes_per_second=915.199Mi/s

mapleFU avatar Mar 18 '24 16:03 mapleFU

cc @pitrou @felipecrv would you mind take a look?

mapleFU avatar Mar 19 '24 13:03 mapleFU

Damn, my new M2 MacOS benchmark result is so unstable...I'll testing it on my PC

mapleFU avatar Mar 20 '24 14:03 mapleFU

On My 3800X in wsl2 Ubuntu 22 and gcc11.4

Before:

CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                2907 ns         2951 ns       236137 bytes_per_second=722.253Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              14712 ns        14757 ns        47026 bytes_per_second=1.20586Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             15105 ns        15141 ns        46329 bytes_per_second=1.17528Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           291833 ns       292024 ns         2367 bytes_per_second=974.669Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          299113 ns       299305 ns         2342 bytes_per_second=950.959Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        302720 ns       302917 ns         2305 bytes_per_second=939.621Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             3395 ns         3430 ns       204188 bytes_per_second=621.394Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           16068 ns        16115 ns        43575 bytes_per_second=1.10422Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          16481 ns        16520 ns        42656 bytes_per_second=1.07716Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        302145 ns       302352 ns         2330 bytes_per_second=941.375Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       306038 ns       306229 ns         2286 bytes_per_second=929.459Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     310153 ns       310357 ns         2262 bytes_per_second=917.094Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                2942 ns         2984 ns       234527 bytes_per_second=714.399Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              16035 ns        16081 ns        43395 bytes_per_second=1.10658Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             15628 ns        15666 ns        44262 bytes_per_second=1.13587Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           310841 ns       311057 ns         2227 bytes_per_second=915.031Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          304648 ns       304836 ns         2269 bytes_per_second=933.704Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        299733 ns       299932 ns         2299 bytes_per_second=948.972Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             3581 ns         3619 ns       195202 bytes_per_second=589.04Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           17593 ns        17636 ns        39367 bytes_per_second=1.009Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          17029 ns        17068 ns        41040 bytes_per_second=1.04259Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        320371 ns       320586 ns         2140 bytes_per_second=887.834Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       317159 ns       317359 ns         2200 bytes_per_second=896.861Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     309201 ns       309384 ns         2258 bytes_per_second=919.979Mi/s

After:

CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                2761 ns         2795 ns       247733 bytes_per_second=762.723Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              14625 ns        14666 ns        46769 bytes_per_second=1.21333Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             15123 ns        15160 ns        46801 bytes_per_second=1.17383Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           288348 ns       288527 ns         2423 bytes_per_second=986.484Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          294416 ns       294585 ns         2367 bytes_per_second=966.197Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        297685 ns       297902 ns         2349 bytes_per_second=955.438Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             3194 ns         3226 ns       216546 bytes_per_second=660.737Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           15455 ns        15497 ns        45292 bytes_per_second=1.14826Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          16032 ns        16067 ns        43646 bytes_per_second=1.10751Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        295400 ns       295587 ns         2352 bytes_per_second=962.921Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       297931 ns       298102 ns         2336 bytes_per_second=954.796Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     300221 ns       300394 ns         2322 bytes_per_second=947.512Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192                2849 ns         2881 ns       241315 bytes_per_second=739.774Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192              15588 ns        15629 ns        44292 bytes_per_second=1.13861Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536             15303 ns        15335 ns        45736 bytes_per_second=1.16039Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192           307433 ns       307581 ns         2290 bytes_per_second=925.372Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536          302313 ns       302499 ns         2316 bytes_per_second=940.92Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576        296901 ns       297087 ns         2366 bytes_per_second=958.061Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192             3208 ns         3244 ns       216204 bytes_per_second=656.994Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192           16401 ns        16444 ns        42091 bytes_per_second=1.08215Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536          16093 ns        16129 ns        44188 bytes_per_second=1.10331Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192        312056 ns       312247 ns         2243 bytes_per_second=911.544Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536       306802 ns       306988 ns         2288 bytes_per_second=927.16Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576     306993 ns       307203 ns         2268 bytes_per_second=926.512Mi/s

mapleFU avatar Mar 20 '24 15:03 mapleFU

@mapleFU Can you update to the latest git main?

pitrou avatar Mar 20 '24 15:03 pitrou

done. A bit late in my timezone, fall asleep now

mapleFU avatar Mar 20 '24 15:03 mapleFU

Here are the benchmark results here (Ubuntu 22.04, AMD Zen 2 CPU):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (24)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                              benchmark        baseline       contender  change %                                                                                                                                                                                                                                                              counters
    CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536   1.102 GiB/sec   1.186 GiB/sec     7.628    {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 42847}
     CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192   1.074 GiB/sec   1.137 GiB/sec     5.855     {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 40979}
    CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536   1.134 GiB/sec   1.192 GiB/sec     5.100    {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43750}
   CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 921.576 MiB/sec 962.690 MiB/sec     4.461    {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2276}
      CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 616.148 MiB/sec 643.130 MiB/sec     4.379     {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 204011}
         CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 708.346 MiB/sec 737.983 MiB/sec     4.184        {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 238344}
  CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 889.445 MiB/sec 921.203 MiB/sec     3.570   {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2202}
        CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192   1.183 GiB/sec   1.222 GiB/sec     3.276        {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 47400}
       CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536   1.194 GiB/sec   1.230 GiB/sec     3.003       {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 47112}
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 920.745 MiB/sec 941.653 MiB/sec     2.271 {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2221}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 918.199 MiB/sec 937.309 MiB/sec     2.081 {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2247}
     CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 910.037 MiB/sec 924.471 MiB/sec     1.586      {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2261}
     CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192   1.176 GiB/sec   1.192 GiB/sec     1.299     {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 45065}
      CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 598.409 MiB/sec 605.872 MiB/sec     1.247     {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 195099}
   CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 943.018 MiB/sec 953.685 MiB/sec     1.131    {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2298}
  CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 916.860 MiB/sec 924.973 MiB/sec     0.885   {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2231}
      CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 909.318 MiB/sec 915.330 MiB/sec     0.661       {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2187}
     CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 934.528 MiB/sec 940.318 MiB/sec     0.620      {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2322}
   CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 892.891 MiB/sec 896.477 MiB/sec     0.402    {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2194}
        CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192   1.306 GiB/sec   1.308 GiB/sec     0.192        {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 50894}
      CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 958.533 MiB/sec 959.449 MiB/sec     0.096       {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2459}
         CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 718.293 MiB/sec 712.813 MiB/sec    -0.763        {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 239597}
   CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 942.925 MiB/sec 935.223 MiB/sec    -0.817    {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2314}
       CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536   1.265 GiB/sec   1.251 GiB/sec    -1.104       {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 49579}

pitrou avatar Mar 20 '24 16:03 pitrou

Slightly faster (see the percentages).

pitrou avatar Mar 20 '24 16:03 pitrou

Thanks! Found that it has a baseline there

mapleFU avatar Mar 20 '24 16:03 mapleFU

@ursabot please benchmark lang=C++

mapleFU avatar Mar 26 '24 17:03 mapleFU

Benchmark runs are scheduled for commit dff8f9c7790e74cb0915a8b164055848a076c0e3. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

ursabot avatar Mar 26 '24 17:03 ursabot

@pitrou Should I update the code or adding other benchmarks?

mapleFU avatar Mar 26 '24 17:03 mapleFU

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit dff8f9c7790e74cb0915a8b164055848a076c0e3.

There were 20 benchmark results indicating a performance regression:

The full Conbench report has more details.

😅I didn't edit any logic about expr...

mapleFU avatar Mar 27 '24 02:03 mapleFU

@ursabot please benchmark lang=R,Python

pitrou avatar Mar 27 '24 08:03 pitrou

Supported benchmark command examples:

@ursabot benchmark help

To run all benchmarks: @ursabot please benchmark

To filter benchmarks by language: @ursabot please benchmark lang=Python @ursabot please benchmark lang=C++ @ursabot please benchmark lang=R @ursabot please benchmark lang=Java @ursabot please benchmark lang=JavaScript

To filter Python and R benchmarks by name: @ursabot please benchmark name=file-write @ursabot please benchmark name=file-write lang=Python @ursabot please benchmark name=file-.*

To filter C++ benchmarks by archery --suite-filter and --benchmark-filter: @ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3

For other command=cpp-micro options, please see https://github.com/voltrondata-labs/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py

ursabot avatar Mar 27 '24 08:03 ursabot

@ursabot please benchmark lang=Python

pitrou avatar Mar 27 '24 08:03 pitrou

@ursabot please benchmark lang=R

pitrou avatar Mar 27 '24 08:03 pitrou

Commit dff8f9c7790e74cb0915a8b164055848a076c0e3 already has scheduled benchmark runs.

ursabot avatar Mar 27 '24 08:03 ursabot