arrow
arrow copied to clipboard
GH-40431: [C++] Try to check/alloc the TempVectorStack size as HashBatch needed
Rationale for this change
Try to check/alloc the TempVectorStack size as HashBatch needed. For now, HashBatch api need more TempVectorStack space if we only input small recordbatch rows.
What changes are included in this PR?
Add estimate size for TempVectorStack's init or alloced size's check before we go into HashMultiColumn.
Are these changes tested?
Are there any user-facing changes?
- GitHub Issue: #40431
cc @kou , it's a temporary fix. And i haven't add ut for now. PTAL?
@kou how about this refactor?
@kou Thank you for your review! @westonpace PTAL? This is a refactoring job, in order to allow users in need to better use the HashBatch related API.
cc @zanmato1984 if you're interested in this
@zanmato1984 Thank you very much for your suggestion, the code looks clearer!
My one last suggestion :)
Thanks!
Thanks @zanmato1984 !
cc @felipecrv @pitrou do we have some reviewers for ASOF join?
@pitrou PTAL! The new commit include two things:
- Move these codes(key_hash/key_map/light_array) to internal! Besides, seems it's unnecessary to use internal namespace for them. The purpose we want is just prevent user's calling/
- Simple refactor to simplify some codes in TempVectorStack.
@github-actions crossbow submit -g cpp
Revision: 2cc866f60f06dba57e1811c2fd302aa7eea9146e
Submitted crossbow builds: ursacomputing/crossbow @ actions-e8962c33a3
@github-actions crossbow submit -g cpp
Revision: 5180559bf5d4c074c81eff419a51a727528c2a07
Submitted crossbow builds: ursacomputing/crossbow @ actions-165b3afd77
@github-actions crossbow submit -g cpp
Revision: 0722f88046cb747861f62dda235f4ed44677ac3e
Submitted crossbow builds: ursacomputing/crossbow @ actions-5141a1fc14
CI failures are unrelated, I'll merge.
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 8163d026b3c56253d9e33c0129fac5d9ba573c53.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them.