vortex
vortex copied to clipboard
Boolean <-> Integer duality
We should support converting between strict sorted integers and boolean masks. We may need an array type to go in both directions?
This could allow us to remove the RoaringUInt array
Capturing from slack:
Currently our tableprovider's pushdown is bottlenecked by take(varbin)
DataFusion defers to arrow's filter_bytes function to turn the predicate mask into new ArrayRef:
https://github.com/apache/arrow-rs/blob/920a94470db04722c74b599a227f930946d0da80/arrow-select/src/filter.rs#L660-L689
We want to have our own boolean builder to construct these masks, calculating run lengths, and using that to alternate between slicing/indexing in our implementation of take()
I think we resolved this in #1327 and #1351. #1327 switched the predicate/mask argument of filter to FilterMask. I'll close but obviously re-open if there's more work to do!
I profiled the entire TPC-H suite and it seems to be mostly DataFusion stuff. _target_release_tpch_benchmark 2024-12-16 11.51 profile.json.gz