mwish comments

Results 249 comments of


                                            mwish

trafficstars

GH-38560: [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd

> What about the macOS M1 Pro ? I've update the result here: https://github.com/apache/arrow/pull/40335#issuecomment-1984131068 Basically, it's about 2times faster

GH-38560: [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd

@github-actions crossbow submit -g wheel

GH-38560: [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd

> I don't have all the context, but upgrading xsimd looks reasonable if it fixes your issues. Would you need a new release? Hi @serge-sans-paille . I found some neon64...

GH-38560: [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd

> Could you open a seperate bug in xsimd bug tracker with a reproducer? Not saying the bug. I mean https://github.com/apache/arrow/pull/40335#issuecomment-1983644942 , some bugfix and neon64 related enhancement is not...

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

cc @pitrou @felipecrv

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

> have you run any benchmarks? Currently not, let me find and run them

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

Before optimize: ``` CompressionInputZeroCopyBenchmark/InputBytes:8192 16717 ns 16318 ns 43287 bytes_per_second=73.5798M/s CompressionInputZeroCopyBenchmark/InputBytes:65536 96598 ns 94595 ns 6962 bytes_per_second=97.6409M/s CompressionInputZeroCopyBenchmark/InputBytes:1048576 1592301 ns 1589814 ns 440 bytes_per_second=90.8238M/s CompressionInputNonZeroCopyBenchmark/InputBytes:8192 19860 ns 19794 ns 36185...

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

(After changing `supports_zero_copy_from_raw_` to const, my optimization would be a little faster. I'll dive into it tomorrow)

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

Sorry for delaying, I'm suffering from to much work this two weeks. I'll enhance this on weekend

GH-39377: [C++] IO: Reuse same buffer in CompressedInputStream

Under LLVM-17, MacOS M1 Pro, Release (-O2): After: ``` CompressionInputZeroCopyBenchmark/InputBytes:8192/PerReadBytes:8192 14066 ns 14042 ns 50325 bytes_per_second=85.509M/s CompressionInputZeroCopyBenchmark/InputBytes:65536/PerReadBytes:8192 81058 ns 80930 ns 8516 bytes_per_second=114.127M/s CompressionInputZeroCopyBenchmark/InputBytes:65536/PerReadBytes:65536 85914 ns 85865 ns 7871 bytes_per_second=107.568M/s...