mwish
mwish
> What about the macOS M1 Pro ? I've update the result here: https://github.com/apache/arrow/pull/40335#issuecomment-1984131068 Basically, it's about 2times faster
@github-actions crossbow submit -g wheel
> I don't have all the context, but upgrading xsimd looks reasonable if it fixes your issues. Would you need a new release? Hi @serge-sans-paille . I found some neon64...
> Could you open a seperate bug in xsimd bug tracker with a reproducer? Not saying the bug. I mean https://github.com/apache/arrow/pull/40335#issuecomment-1983644942 , some bugfix and neon64 related enhancement is not...
cc @pitrou @felipecrv
> have you run any benchmarks? Currently not, let me find and run them
Before optimize: ``` CompressionInputZeroCopyBenchmark/InputBytes:8192 16717 ns 16318 ns 43287 bytes_per_second=73.5798M/s CompressionInputZeroCopyBenchmark/InputBytes:65536 96598 ns 94595 ns 6962 bytes_per_second=97.6409M/s CompressionInputZeroCopyBenchmark/InputBytes:1048576 1592301 ns 1589814 ns 440 bytes_per_second=90.8238M/s CompressionInputNonZeroCopyBenchmark/InputBytes:8192 19860 ns 19794 ns 36185...
(After changing `supports_zero_copy_from_raw_` to const, my optimization would be a little faster. I'll dive into it tomorrow)
Sorry for delaying, I'm suffering from to much work this two weeks. I'll enhance this on weekend
Under LLVM-17, MacOS M1 Pro, Release (-O2): After: ``` CompressionInputZeroCopyBenchmark/InputBytes:8192/PerReadBytes:8192 14066 ns 14042 ns 50325 bytes_per_second=85.509M/s CompressionInputZeroCopyBenchmark/InputBytes:65536/PerReadBytes:8192 81058 ns 80930 ns 8516 bytes_per_second=114.127M/s CompressionInputZeroCopyBenchmark/InputBytes:65536/PerReadBytes:65536 85914 ns 85865 ns 7871 bytes_per_second=107.568M/s...