arrow GH-40431: [C++] Try to check/alloc the TempVectorStack size as HashBatch needed

Rationale for this change

Try to check/alloc the TempVectorStack size as HashBatch needed. For now, HashBatch api need more TempVectorStack space if we only input small recordbatch rows.

What changes are included in this PR?

Add estimate size for TempVectorStack's init or alloced size's check before we go into HashMultiColumn.

Are these changes tested?

Are there any user-facing changes?

GitHub Issue: #40431

Mar 12 '24 10:03 ZhangHuiGui

cc @kou , it's a temporary fix. And i haven't add ut for now. PTAL?

Mar 12 '24 10:03 ZhangHuiGui

@kou how about this refactor?

Mar 25 '24 04:03 ZhangHuiGui

@kou Thank you for your review! @westonpace PTAL? This is a refactoring job, in order to allow users in need to better use the HashBatch related API.

Mar 26 '24 14:03 ZhangHuiGui

cc @zanmato1984 if you're interested in this

Mar 26 '24 14:03 mapleFU

@zanmato1984 Thank you very much for your suggestion, the code looks clearer!

Mar 27 '24 14:03 ZhangHuiGui

My one last suggestion :)

Thanks!

Mar 28 '24 01:03 ZhangHuiGui

Thanks @zanmato1984 !

cc @felipecrv @pitrou do we have some reviewers for ASOF join?

Mar 28 '24 06:03 mapleFU

@pitrou PTAL! The new commit include two things:

Move these codes(key_hash/key_map/light_array) to internal! Besides, seems it's unnecessary to use internal namespace for them. The purpose we want is just prevent user's calling/
Simple refactor to simplify some codes in TempVectorStack.

Mar 30 '24 05:03 ZhangHuiGui

@github-actions crossbow submit -g cpp

Apr 02 '24 12:04 pitrou

Revision: 2cc866f60f06dba57e1811c2fd302aa7eea9146e

Submitted crossbow builds: ursacomputing/crossbow @ actions-e8962c33a3

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-14

Apr 02 '24 12:04 github-actions[bot]

@github-actions crossbow submit -g cpp

Apr 02 '24 13:04 pitrou

Revision: 5180559bf5d4c074c81eff419a51a727528c2a07

Submitted crossbow builds: ursacomputing/crossbow @ actions-165b3afd77

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-14

Apr 02 '24 13:04 github-actions[bot]

@github-actions crossbow submit -g cpp

Apr 02 '24 14:04 pitrou

Revision: 0722f88046cb747861f62dda235f4ed44677ac3e

Submitted crossbow builds: ursacomputing/crossbow @ actions-5141a1fc14

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-14

Apr 02 '24 14:04 github-actions[bot]

CI failures are unrelated, I'll merge.

Apr 02 '24 16:04 pitrou

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 8163d026b3c56253d9e33c0129fac5d9ba573c53.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them.

Apr 03 '24 08:04 conbench-apache-arrow[bot]

arrow arrow copied to clipboard

GH-40431: [C++] Try to check/alloc the TempVectorStack size as HashBatch needed

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

arrow
arrow copied to clipboard