vortex icon indicating copy to clipboard operation
vortex copied to clipboard

Boolean <-> Integer duality

Open gatesn opened this issue 1 year ago • 1 comments

We should support converting between strict sorted integers and boolean masks. We may need an array type to go in both directions?

This could allow us to remove the RoaringUInt array

gatesn avatar Jul 09 '24 15:07 gatesn

Capturing from slack:

Currently our tableprovider's pushdown is bottlenecked by take(varbin)

image

DataFusion defers to arrow's filter_bytes function to turn the predicate mask into new ArrayRef:

https://github.com/apache/arrow-rs/blob/920a94470db04722c74b599a227f930946d0da80/arrow-select/src/filter.rs#L660-L689

We want to have our own boolean builder to construct these masks, calculating run lengths, and using that to alternate between slicing/indexing in our implementation of take()

a10y avatar Jul 12 '24 22:07 a10y

I think we resolved this in #1327 and #1351. #1327 switched the predicate/mask argument of filter to FilterMask. I'll close but obviously re-open if there's more work to do!

I profiled the entire TPC-H suite and it seems to be mostly DataFusion stuff. _target_release_tpch_benchmark 2024-12-16 11.51 profile.json.gz

Screenshot 2024-12-16 at 11 58 27 AM Screenshot 2024-12-16 at 11 59 23 AM

danking avatar Dec 16 '24 17:12 danking