Nicholas Gates comments

Results 138 comments of


                                            Nicholas Gates

Report splits to DataFusion as partitions

The benefit of this is that DataFusion can choose to parallelise the rest of the execution plan, whereas in the current model, we do use all threads of the runtime...

Compute function test harness

Basic setup is done, we lack coverage amongst all compute functions

Closing as stale, each compute function should define a canonical implementation alongside itself (https://github.com/vortex-data/vortex/issues/3454). That means it should be relatively easy to compare results if we also have #1424

Array metadata to use protobuf

Are you saying this is a good indication of the target size of metadata? It would be worth adding the `metadata_bytes().length()` to that table since that's what's actually in the...

Array metadata to use protobuf

If we're considering Avro, why not protobuffers? The other thing to consider is that we shouldn't enforce the format. Not all encodings need to use the same representation. For the...

Array metadata to use protobuf

Since we changed Vortex arrays to hold metadatai in-memory, we no longer need zero-cost reads from serialized metadata. This relaxes the constraints and we should probably just use protobuf by...

`time_elapsed_opening` spends too much time

We have a known fix to avoid constructing the full scan plan up-front, but seems like the other metrics are a little slow too. Do you mind sharing some properties...

bench spawn, spawn_blocking, unblock

We use spawn as this follows DataFusion's CPU scheduling logic and makes most sense. spawn_blocking isn't really for CPU heavy workloads. It's for I/O bound but blocking workloads. In other...

bench spawn, spawn_blocking, unblock

🤷 it's a bit of both? > Tokio will spawn more blocking threads when they are requested through this function until the upper limit configured on the [Builder](https://dtantsur.github.io/rust-openstack/tokio/task/struct@crate::runtime::Builder) is reached....

feature: Bloom Filters

Any reason the bloom filters aren't just a binary column in the zone map? In theory, we could write the zone map using a StructLayout such that only the required...