GH-48277: [C++][Parquet] unpack with shuffle algorithm
Rationale for this change
What changes are included in this PR?
- Add a new method for building unpacking kernels.
The
constexprcode generation creates a kernel appropriate for a given input/output bit width and simd size. - I have included a number of
xsimdfallback that have been merged upstream. - I have run extensive benchmarks and re-dispatched among different sizes on specific architectures when it was not performing well.
- The biggest win here is SSE4.2, though AVX2 improves too.
- This is not built/tested for AVX512, though there are not really limitation. Currently the arch detection between all the avx512 is not consistent and sometimes error. I would need to investigate with the upcoming
xsimdrelease.
Are these changes tested?
Yes
Are there any user-facing changes?
No
- GitHub Issue: #48277
Thanks for opening a pull request!
If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.
Then could you also rename the pull request title in the following format?
GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
or
MINOR: [${COMPONENT}] ${SUMMARY}
See also:
:warning: GitHub issue #48277 has been automatically assigned in GitHub to PR creator.
@pitrou apart from R-lint, this is looking pretty good.
@ursabot please benchmark lang=C++
Benchmark runs are scheduled for commit a4bfe8addf409c235e0fd96eead5b489447029d0. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit a4bfe8addf409c235e0fd96eead5b489447029d0.
There were 37 benchmark results indicating a performance regression:
- Pull Request Run on
amd64-c6a-4xlarge-linuxat 2025-11-27 19:32:37Z - and 35 more (see the report linked below)
The full Conbench report has more details.
@pitrou I'm running this locally, and I made an error when fixing ASAN over-reading problem. These latest benchmarks are not doing well.
@ursabot please benchmark lang=C++
Benchmark runs are scheduled for commit dd3ec0d692e4f409bd73952de9bab20d8c97c226. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit dd3ec0d692e4f409bd73952de9bab20d8c97c226.
There were 19 benchmark results indicating a performance regression:
- Pull Request Run on
amd64-c6a-4xlarge-linuxat 2025-11-28 15:12:47Z - and 17 more (see the report linked below)
The full Conbench report has more details.
@ursabot please benchmark lang=C++
Benchmark runs are scheduled for commit 408ef04ad96a9752654cfe54d4de6c7c2eef08cc. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.
@ursabot please benchmark lang=C++
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 408ef04ad96a9752654cfe54d4de6c7c2eef08cc.
None of the specified runs were found on the Conbench server.
The full Conbench report has more details.
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 408ef04ad96a9752654cfe54d4de6c7c2eef08cc.
None of the specified runs were found on the Conbench server.
The full Conbench report has more details.
@ursabot please benchmark lang=C++