arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-40431: [C++] Try to check/alloc the TempVectorStack size as HashBatch needed

Open ZhangHuiGui opened this issue 11 months ago • 1 comments

Rationale for this change

Try to check/alloc the TempVectorStack size as HashBatch needed. For now, HashBatch api need more TempVectorStack space if we only input small recordbatch rows.

What changes are included in this PR?

Add estimate size for TempVectorStack's init or alloced size's check before we go into HashMultiColumn.

Are these changes tested?

Are there any user-facing changes?

  • GitHub Issue: #40431

ZhangHuiGui avatar Mar 12 '24 10:03 ZhangHuiGui

cc @kou , it's a temporary fix. And i haven't add ut for now. PTAL?

ZhangHuiGui avatar Mar 12 '24 10:03 ZhangHuiGui

@kou how about this refactor?

ZhangHuiGui avatar Mar 25 '24 04:03 ZhangHuiGui

@kou Thank you for your review! @westonpace PTAL? This is a refactoring job, in order to allow users in need to better use the HashBatch related API.

ZhangHuiGui avatar Mar 26 '24 14:03 ZhangHuiGui

cc @zanmato1984 if you're interested in this

mapleFU avatar Mar 26 '24 14:03 mapleFU

@zanmato1984 Thank you very much for your suggestion, the code looks clearer!

ZhangHuiGui avatar Mar 27 '24 14:03 ZhangHuiGui

My one last suggestion :)

Thanks!

ZhangHuiGui avatar Mar 28 '24 01:03 ZhangHuiGui

Thanks @zanmato1984 !

cc @felipecrv @pitrou do we have some reviewers for ASOF join?

mapleFU avatar Mar 28 '24 06:03 mapleFU

@pitrou PTAL! The new commit include two things:

  1. Move these codes(key_hash/key_map/light_array) to internal! Besides, seems it's unnecessary to use internal namespace for them. The purpose we want is just prevent user's calling/
  2. Simple refactor to simplify some codes in TempVectorStack.

ZhangHuiGui avatar Mar 30 '24 05:03 ZhangHuiGui

@github-actions crossbow submit -g cpp

pitrou avatar Apr 02 '24 12:04 pitrou

Revision: 2cc866f60f06dba57e1811c2fd302aa7eea9146e

Submitted crossbow builds: ursacomputing/crossbow @ actions-e8962c33a3

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

github-actions[bot] avatar Apr 02 '24 12:04 github-actions[bot]

@github-actions crossbow submit -g cpp

pitrou avatar Apr 02 '24 13:04 pitrou

Revision: 5180559bf5d4c074c81eff419a51a727528c2a07

Submitted crossbow builds: ursacomputing/crossbow @ actions-165b3afd77

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

github-actions[bot] avatar Apr 02 '24 13:04 github-actions[bot]

@github-actions crossbow submit -g cpp

pitrou avatar Apr 02 '24 14:04 pitrou

Revision: 0722f88046cb747861f62dda235f4ed44677ac3e

Submitted crossbow builds: ursacomputing/crossbow @ actions-5141a1fc14

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

github-actions[bot] avatar Apr 02 '24 14:04 github-actions[bot]

CI failures are unrelated, I'll merge.

pitrou avatar Apr 02 '24 16:04 pitrou

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 8163d026b3c56253d9e33c0129fac5d9ba573c53.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them.