arrow
arrow copied to clipboard
GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream
Rationale for this change
This patch reuses the same buffer in CompressedInputStream
. It includes the decompress_
and compress_
buffer
What changes are included in this PR?
- For
compress_
, allocate and reuse same buffer withkChunkSize
(64KB), and reusing it - For
decompress_
, reusing a same buffer (mostly 1MB) without continuesReallocate
In the worst case, decompress_
might hold a large buffer.
Are these changes tested?
Already
Are there any user-facing changes?
CompressedInputStream
might has larger buffer
- Closes: #39377
:warning: GitHub issue #39377 has been automatically assigned in GitHub to PR creator.
cc @pitrou @felipecrv
have you run any benchmarks?
Currently not, let me find and run them
I really need to work on these stream classes before I can fee confident reviewing these optimizations.
Before optimize:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192 16717 ns 16318 ns 43287 bytes_per_second=73.5798M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536 96598 ns 94595 ns 6962 bytes_per_second=97.6409M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576 1592301 ns 1589814 ns 440 bytes_per_second=90.8238M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192 19860 ns 19794 ns 36185 bytes_per_second=60.6601M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536 103077 ns 101591 ns 7024 bytes_per_second=90.9171M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576 1680448 ns 1636030 ns 428 bytes_per_second=88.2581M/s
After:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192 18100 ns 17072 ns 40079 bytes_per_second=70.3304M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536 95859 ns 93712 ns 7438 bytes_per_second=98.5613M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576 1629148 ns 1608965 ns 428 bytes_per_second=89.7428M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192 20614 ns 20083 ns 33628 bytes_per_second=59.7863M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536 100852 ns 98297 ns 6913 bytes_per_second=93.9637M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576 1615719 ns 1608626 ns 433 bytes_per_second=89.7617M/s
Seems it even be slower in some cases, I'll try on it later
You should benchmark using a faster codec such as LZ4, if you really want to measure the overhead of CompressedInputStream.
(After changing supports_zero_copy_from_raw_
to const, my optimization would be a little faster. I'll dive into it tomorrow)
Sorry for delaying, I'm suffering from to much work this two weeks. I'll enhance this on weekend
It's ok @mapleFU !
Under LLVM-17, MacOS M1 Pro, Release (-O2):
After:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 14066 ns 14042 ns 50325 bytes_per_second=85.509M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 81058 ns 80930 ns 8516 bytes_per_second=114.127M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 85914 ns 85865 ns 7871 bytes_per_second=107.568M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1383077 ns 1380249 ns 511 bytes_per_second=104.614M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1381771 ns 1379589 ns 504 bytes_per_second=104.664M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1449293 ns 1445271 ns 484 bytes_per_second=99.9072M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 17520 ns 17086 ns 40610 bytes_per_second=70.2738M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 90540 ns 86818 ns 8047 bytes_per_second=106.387M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 92686 ns 90816 ns 7614 bytes_per_second=101.704M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1416457 ns 1387700 ns 510 bytes_per_second=104.052M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1403328 ns 1397628 ns 505 bytes_per_second=103.313M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1469190 ns 1460000 ns 481 bytes_per_second=98.8993M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 13032 ns 12953 ns 54602 bytes_per_second=91.2257M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 57312 ns 57157 ns 12273 bytes_per_second=162.463M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 65626 ns 64273 ns 10869 bytes_per_second=144.477M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 925974 ns 925072 ns 746 bytes_per_second=158.115M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 961608 ns 959083 ns 750 bytes_per_second=152.508M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 1029553 ns 1028537 ns 680 bytes_per_second=142.21M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 20305 ns 17293 ns 46128 bytes_per_second=68.3272M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 60087 ns 59985 ns 11039 bytes_per_second=154.805M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 74347 ns 69853 ns 10346 bytes_per_second=132.936M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 1114509 ns 992978 ns 721 bytes_per_second=147.302M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 960557 ns 959252 ns 710 bytes_per_second=152.481M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 1042081 ns 1027100 ns 700 bytes_per_second=142.409M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 7225 ns 7220 ns 88111 bytes_per_second=300.086M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 25710 ns 23944 ns 30632 bytes_per_second=760.71M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 30453 ns 30158 ns 24247 bytes_per_second=603.954M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 398575 ns 396622 ns 1771 bytes_per_second=715.976M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 403529 ns 400651 ns 1783 bytes_per_second=708.777M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 467985 ns 463299 ns 1513 bytes_per_second=612.934M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 9285 ns 9243 ns 71174 bytes_per_second=234.419M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 29829 ns 28100 ns 25100 bytes_per_second=648.201M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 36288 ns 34810 ns 20527 bytes_per_second=523.245M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 402043 ns 398348 ns 1715 bytes_per_second=712.874M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 422659 ns 416023 ns 1623 bytes_per_second=682.587M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 498678 ns 489495 ns 1440 bytes_per_second=580.132M/s
Before:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 14371 ns 14325 ns 46902 bytes_per_second=83.8181M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 77623 ns 77507 ns 9161 bytes_per_second=119.168M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 87373 ns 87304 ns 8358 bytes_per_second=105.795M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1383045 ns 1382063 ns 504 bytes_per_second=104.476M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1370321 ns 1369469 ns 512 bytes_per_second=105.437M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1433147 ns 1432126 ns 493 bytes_per_second=100.824M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 16448 ns 16447 ns 41847 bytes_per_second=73.0014M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 82950 ns 82436 ns 8123 bytes_per_second=112.042M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 88021 ns 87920 ns 7805 bytes_per_second=105.053M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1384970 ns 1383970 ns 506 bytes_per_second=104.332M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1386657 ns 1385639 ns 509 bytes_per_second=104.207M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1444835 ns 1443245 ns 490 bytes_per_second=100.047M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 13062 ns 12891 ns 50916 bytes_per_second=91.6604M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 56024 ns 55910 ns 11993 bytes_per_second=166.088M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 65663 ns 64584 ns 11142 bytes_per_second=143.781M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 963169 ns 941353 ns 734 bytes_per_second=155.381M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 1048394 ns 972503 ns 733 bytes_per_second=150.403M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 1031212 ns 1028287 ns 687 bytes_per_second=142.244M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 16712 ns 16258 ns 43318 bytes_per_second=72.6775M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 60878 ns 60527 ns 11269 bytes_per_second=153.417M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 68693 ns 67494 ns 10350 bytes_per_second=137.581M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 950998 ns 946565 ns 722 bytes_per_second=154.525M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 964300 ns 962337 ns 733 bytes_per_second=151.992M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 1029719 ns 1028186 ns 665 bytes_per_second=142.258M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 7108 ns 7084 ns 92116 bytes_per_second=305.886M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 22935 ns 22908 ns 27823 bytes_per_second=795.094M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 30680 ns 30548 ns 24209 bytes_per_second=596.256M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 389600 ns 389246 ns 1755 bytes_per_second=729.544M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 395812 ns 395273 ns 1705 bytes_per_second=718.419M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 457402 ns 456781 ns 1518 bytes_per_second=621.68M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 9360 ns 9350 ns 76763 bytes_per_second=231.729M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 27402 ns 27259 ns 25881 bytes_per_second=668.181M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 32157 ns 32138 ns 20886 bytes_per_second=566.753M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 444164 ns 441893 ns 1583 bytes_per_second=642.625M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 446398 ns 446078 ns 1550 bytes_per_second=636.597M/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 493388 ns 493005 ns 1300 bytes_per_second=576.001M/s
After reducing calling to ResizableBuffer::Resize
, current pr turns to:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 17346 ns 14395 ns 47957 bytes_per_second=83.4107M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 74538 ns 74469 ns 8225 bytes_per_second=124.029M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 83750 ns 82631 ns 8278 bytes_per_second=111.778M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1345548 ns 1337842 ns 530 bytes_per_second=107.93M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1358944 ns 1357852 ns 534 bytes_per_second=106.339M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1397426 ns 1391512 ns 508 bytes_per_second=103.767M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 12235 ns 12200 ns 58011 bytes_per_second=96.8532M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 55298 ns 55255 ns 12700 bytes_per_second=168.055M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 60370 ns 59764 ns 11419 bytes_per_second=155.377M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 903461 ns 900298 ns 769 bytes_per_second=162.466M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 950861 ns 946309 ns 744 bytes_per_second=154.567M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 978464 ns 977109 ns 727 bytes_per_second=149.695M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 6457 ns 6451 ns 105751 bytes_per_second=335.891M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 21589 ns 21585 ns 31528 bytes_per_second=843.832M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 28415 ns 28346 ns 24531 bytes_per_second=642.569M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 382945 ns 382345 ns 1850 bytes_per_second=742.711M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 372741 ns 372615 ns 1829 bytes_per_second=762.106M/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 425632 ns 425422 ns 1617 bytes_per_second=667.506M/s
Notice that decompressing a 1024 * 1024 compressed data turns faster, other wouldn't changed
On my win wsl Ubuntu22, AMD 3800X with gcc11.4, Release (-O2):
After:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 15113 ns 15153 ns 45795 bytes_per_second=79.4275Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 121673 ns 121761 ns 5686 bytes_per_second=75.8248Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 122201 ns 122270 ns 5718 bytes_per_second=75.5095Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1918363 ns 1918733 ns 357 bytes_per_second=75.2846Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1925816 ns 1926215 ns 362 bytes_per_second=74.9922Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1949610 ns 1950057 ns 363 bytes_per_second=74.0753Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 15292 ns 15331 ns 44844 bytes_per_second=78.5026Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 122159 ns 122242 ns 5717 bytes_per_second=75.5267Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 122144 ns 122221 ns 5687 bytes_per_second=75.5394Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1916707 ns 1917085 ns 361 bytes_per_second=75.3494Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1936225 ns 1936580 ns 356 bytes_per_second=74.5909Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1974165 ns 1974662 ns 354 bytes_per_second=73.1523Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 7642 ns 7655 ns 91056 bytes_per_second=154.365Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 39511 ns 39541 ns 17486 bytes_per_second=233.445Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 40372 ns 40405 ns 17402 bytes_per_second=228.45Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 742301 ns 742596 ns 942 bytes_per_second=196.959Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 741678 ns 741958 ns 946 bytes_per_second=197.129Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 749815 ns 750119 ns 938 bytes_per_second=194.984Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 8083 ns 8097 ns 86602 bytes_per_second=145.937Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 40381 ns 40418 ns 17284 bytes_per_second=228.379Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 40977 ns 41010 ns 16777 bytes_per_second=225.084Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 743131 ns 743474 ns 943 bytes_per_second=196.727Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 753482 ns 753796 ns 920 bytes_per_second=194.033Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 763397 ns 763723 ns 914 bytes_per_second=191.511Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 2627 ns 2658 ns 259151 bytes_per_second=801.929Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 14589 ns 14627 ns 47447 bytes_per_second=1.21655Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15057 ns 15091 ns 47015 bytes_per_second=1.17918Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 292254 ns 292447 ns 2399 bytes_per_second=973.26Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 296420 ns 296544 ns 2361 bytes_per_second=959.815Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 299178 ns 299350 ns 2342 bytes_per_second=950.818Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 3133 ns 3165 ns 222458 bytes_per_second=673.399Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 15510 ns 15550 ns 44675 bytes_per_second=1.14436Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15672 ns 15700 ns 44787 bytes_per_second=1.1334Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 296992 ns 297173 ns 2367 bytes_per_second=957.784Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 302407 ns 302596 ns 2294 bytes_per_second=940.617Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 304691 ns 304865 ns 2288 bytes_per_second=933.618Mi/s
Before:
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 15091 ns 15129 ns 44783 bytes_per_second=79.5491Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 124276 ns 124365 ns 5609 bytes_per_second=74.2374Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 125119 ns 125202 ns 5581 bytes_per_second=73.7413Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 1967307 ns 1967803 ns 357 bytes_per_second=73.4073Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 1966845 ns 1967298 ns 358 bytes_per_second=73.4262Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1969676 ns 1970107 ns 355 bytes_per_second=73.3215Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:8192/PerReadBytes:8192 15471 ns 15510 ns 44616 bytes_per_second=77.5999Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:8192 125884 ns 125971 ns 5556 bytes_per_second=73.2907Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:65536/PerReadBytes:65536 123910 ns 123991 ns 5612 bytes_per_second=74.4612Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:8192 2012901 ns 2013295 ns 354 bytes_per_second=71.7486Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:65536 2019119 ns 2019770 ns 349 bytes_per_second=71.5186Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::GZIP>/InputBytes:1048576/PerReadBytes:1048576 1979235 ns 1979722 ns 346 bytes_per_second=72.9654Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 7864 ns 7883 ns 88901 bytes_per_second=149.899Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 40161 ns 40195 ns 17269 bytes_per_second=229.646Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 40392 ns 40423 ns 17157 bytes_per_second=228.351Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 762881 ns 763268 ns 942 bytes_per_second=191.625Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 749413 ns 749734 ns 929 bytes_per_second=195.084Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 763498 ns 763866 ns 918 bytes_per_second=191.475Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:8192/PerReadBytes:8192 8482 ns 8500 ns 81298 bytes_per_second=139.019Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:8192 40456 ns 40489 ns 16916 bytes_per_second=227.979Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:65536/PerReadBytes:65536 41042 ns 41078 ns 16997 bytes_per_second=224.709Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:8192 741352 ns 741649 ns 930 bytes_per_second=197.211Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:65536 749918 ns 750239 ns 936 bytes_per_second=194.953Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::ZSTD>/InputBytes:1048576/PerReadBytes:1048576 758674 ns 758991 ns 930 bytes_per_second=192.705Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 2645 ns 2675 ns 261762 bytes_per_second=796.956Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 14682 ns 14715 ns 46999 bytes_per_second=1.20928Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15221 ns 15248 ns 45742 bytes_per_second=1.16704Gi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 301442 ns 301624 ns 2331 bytes_per_second=943.649Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 307876 ns 308057 ns 2276 bytes_per_second=923.943Mi/s
CompressionInputZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 304182 ns 304369 ns 2301 bytes_per_second=935.136Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 3257 ns 3288 ns 211478 bytes_per_second=648.168Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 16088 ns 16128 ns 43704 bytes_per_second=1.10335Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 16290 ns 16319 ns 42819 bytes_per_second=1.09042Gi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 305555 ns 305746 ns 2311 bytes_per_second=930.925Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 308251 ns 308442 ns 2274 bytes_per_second=922.79Mi/s
CompressionInputNonZeroCopyBenchmark<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 310809 ns 311000 ns 2257 bytes_per_second=915.199Mi/s
cc @pitrou @felipecrv would you mind take a look?
Damn, my new M2 MacOS benchmark result is so unstable...I'll testing it on my PC
On My 3800X in wsl2 Ubuntu 22 and gcc11.4
Before:
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 2907 ns 2951 ns 236137 bytes_per_second=722.253Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 14712 ns 14757 ns 47026 bytes_per_second=1.20586Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15105 ns 15141 ns 46329 bytes_per_second=1.17528Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 291833 ns 292024 ns 2367 bytes_per_second=974.669Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 299113 ns 299305 ns 2342 bytes_per_second=950.959Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 302720 ns 302917 ns 2305 bytes_per_second=939.621Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 3395 ns 3430 ns 204188 bytes_per_second=621.394Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 16068 ns 16115 ns 43575 bytes_per_second=1.10422Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 16481 ns 16520 ns 42656 bytes_per_second=1.07716Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 302145 ns 302352 ns 2330 bytes_per_second=941.375Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 306038 ns 306229 ns 2286 bytes_per_second=929.459Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 310153 ns 310357 ns 2262 bytes_per_second=917.094Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 2942 ns 2984 ns 234527 bytes_per_second=714.399Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 16035 ns 16081 ns 43395 bytes_per_second=1.10658Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15628 ns 15666 ns 44262 bytes_per_second=1.13587Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 310841 ns 311057 ns 2227 bytes_per_second=915.031Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 304648 ns 304836 ns 2269 bytes_per_second=933.704Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 299733 ns 299932 ns 2299 bytes_per_second=948.972Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 3581 ns 3619 ns 195202 bytes_per_second=589.04Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 17593 ns 17636 ns 39367 bytes_per_second=1.009Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 17029 ns 17068 ns 41040 bytes_per_second=1.04259Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 320371 ns 320586 ns 2140 bytes_per_second=887.834Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 317159 ns 317359 ns 2200 bytes_per_second=896.861Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 309201 ns 309384 ns 2258 bytes_per_second=919.979Mi/s
After:
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 2761 ns 2795 ns 247733 bytes_per_second=762.723Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 14625 ns 14666 ns 46769 bytes_per_second=1.21333Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15123 ns 15160 ns 46801 bytes_per_second=1.17383Gi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 288348 ns 288527 ns 2423 bytes_per_second=986.484Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 294416 ns 294585 ns 2367 bytes_per_second=966.197Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 297685 ns 297902 ns 2349 bytes_per_second=955.438Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 3194 ns 3226 ns 216546 bytes_per_second=660.737Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 15455 ns 15497 ns 45292 bytes_per_second=1.14826Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 16032 ns 16067 ns 43646 bytes_per_second=1.10751Gi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 295400 ns 295587 ns 2352 bytes_per_second=962.921Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 297931 ns 298102 ns 2336 bytes_per_second=954.796Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 300221 ns 300394 ns 2322 bytes_per_second=947.512Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 2849 ns 2881 ns 241315 bytes_per_second=739.774Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 15588 ns 15629 ns 44292 bytes_per_second=1.13861Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 15303 ns 15335 ns 45736 bytes_per_second=1.16039Gi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 307433 ns 307581 ns 2290 bytes_per_second=925.372Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 302313 ns 302499 ns 2316 bytes_per_second=940.92Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 296901 ns 297087 ns 2366 bytes_per_second=958.061Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 3208 ns 3244 ns 216204 bytes_per_second=656.994Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 16401 ns 16444 ns 42091 bytes_per_second=1.08215Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 16093 ns 16129 ns 44188 bytes_per_second=1.10331Gi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 312056 ns 312247 ns 2243 bytes_per_second=911.544Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 306802 ns 306988 ns 2288 bytes_per_second=927.16Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 306993 ns 307203 ns 2268 bytes_per_second=926.512Mi/s
@mapleFU Can you update to the latest git main?
done. A bit late in my timezone, fall asleep now
Here are the benchmark results here (Ubuntu 22.04, AMD Zen 2 CPU):
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (24)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 1.102 GiB/sec 1.186 GiB/sec 7.628 {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 42847}
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 1.074 GiB/sec 1.137 GiB/sec 5.855 {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 40979}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 1.134 GiB/sec 1.192 GiB/sec 5.100 {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43750}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 921.576 MiB/sec 962.690 MiB/sec 4.461 {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2276}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 616.148 MiB/sec 643.130 MiB/sec 4.379 {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 204011}
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 708.346 MiB/sec 737.983 MiB/sec 4.184 {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 238344}
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 889.445 MiB/sec 921.203 MiB/sec 3.570 {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2202}
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 1.183 GiB/sec 1.222 GiB/sec 3.276 {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 47400}
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 1.194 GiB/sec 1.230 GiB/sec 3.003 {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 47112}
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 920.745 MiB/sec 941.653 MiB/sec 2.271 {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2221}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 918.199 MiB/sec 937.309 MiB/sec 2.081 {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2247}
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 910.037 MiB/sec 924.471 MiB/sec 1.586 {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2261}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 1.176 GiB/sec 1.192 GiB/sec 1.299 {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 45065}
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 598.409 MiB/sec 605.872 MiB/sec 1.247 {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 195099}
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 943.018 MiB/sec 953.685 MiB/sec 1.131 {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2298}
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 916.860 MiB/sec 924.973 MiB/sec 0.885 {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2231}
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 909.318 MiB/sec 915.330 MiB/sec 0.661 {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2187}
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536 934.528 MiB/sec 940.318 MiB/sec 0.620 {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2322}
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 892.891 MiB/sec 896.477 MiB/sec 0.402 {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2194}
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192 1.306 GiB/sec 1.308 GiB/sec 0.192 {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 50894}
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192 958.533 MiB/sec 959.449 MiB/sec 0.096 {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2459}
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192 718.293 MiB/sec 712.813 MiB/sec -0.763 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 239597}
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576 942.925 MiB/sec 935.223 MiB/sec -0.817 {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2314}
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536 1.265 GiB/sec 1.251 GiB/sec -1.104 {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 49579}
Slightly faster (see the percentages).
Thanks! Found that it has a baseline there
@ursabot please benchmark lang=C++
Benchmark runs are scheduled for commit dff8f9c7790e74cb0915a8b164055848a076c0e3. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.
@pitrou Should I update the code or adding other benchmarks?
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit dff8f9c7790e74cb0915a8b164055848a076c0e3.
There were 20 benchmark results indicating a performance regression:
- Pull Request Run on
arm64-m6g-linux-compute
at 2024-03-26 18:46:07Z-
FilterOverheadIsolated
(C++) with params=complex_expression/batch_size:10000/null_prob:0/bool_true_prob:50/real_time, source=cpp-micro, suite=arrow-acero-filter-benchmark -
FilterOverhead
(C++) with params=ref_only_expression/batch_size:10000/null_prob:0/bool_true_prob:50/real_time, source=cpp-micro, suite=arrow-acero-filter-benchmark
-
- and 18 more (see the report linked below)
The full Conbench report has more details.
😅I didn't edit any logic about expr...
@ursabot please benchmark lang=R,Python
Supported benchmark command examples:
@ursabot benchmark help
To run all benchmarks:
@ursabot please benchmark
To filter benchmarks by language:
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=C++
@ursabot please benchmark lang=R
@ursabot please benchmark lang=Java
@ursabot please benchmark lang=JavaScript
To filter Python and R benchmarks by name:
@ursabot please benchmark name=file-write
@ursabot please benchmark name=file-write lang=Python
@ursabot please benchmark name=file-.*
To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3
For other command=cpp-micro
options, please see https://github.com/voltrondata-labs/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=R
Commit dff8f9c7790e74cb0915a8b164055848a076c0e3 already has scheduled benchmark runs.