velox icon indicating copy to clipboard operation
velox copied to clipboard

[WIP] Use HashStringAllocator for raw_vector rows in HashLookup

Open czentgr opened this issue 1 year ago • 2 comments
trafficstars

The rows in the HashLookup are stored as raw_vector. The raw_vector uses a malloc type allocation to acquire aligned memory. It was found this contributes to large unaccounted memory when mainly using the HashProbe and other operators.

This change allows the usage of a custom allocator for the raw_vector. By default it keeps the current way of allocating and deallocating.

For raw_vectors representing rows in the HashLookup the allocator used is the HashStringAllocator which allows for the accounting of the memory.

czentgr avatar Jun 28 '24 23:06 czentgr

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 2ece662dfb30cc243738a364d4d102fcf5a3d7f8
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/671ac4cfe8b82a00082ed753

netlify[bot] avatar Jun 28 '24 23:06 netlify[bot]

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

stale[bot] avatar Oct 24 '24 20:10 stale[bot]

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

stale[bot] avatar Jan 24 '25 03:01 stale[bot]

In the end, the solution used was to pass in memory pool to handle the allocations and not use the HashAllocator. The problem with the the latter was that spilled rows were lost.

The actual fix is here: https://github.com/facebookincubator/velox/pull/12582

czentgr avatar Mar 18 '25 23:03 czentgr