Nicholas Gates
Nicholas Gates
The benefit of this is that DataFusion can choose to parallelise the rest of the execution plan, whereas in the current model, we do use all threads of the runtime...
Basic setup is done, we lack coverage amongst all compute functions
Closing as stale, each compute function should define a canonical implementation alongside itself (https://github.com/vortex-data/vortex/issues/3454). That means it should be relatively easy to compare results if we also have #1424
Are you saying this is a good indication of the target size of metadata? It would be worth adding the `metadata_bytes().length()` to that table since that's what's actually in the...
If we're considering Avro, why not protobuffers? The other thing to consider is that we shouldn't enforce the format. Not all encodings need to use the same representation. For the...
Since we changed Vortex arrays to hold metadatai in-memory, we no longer need zero-cost reads from serialized metadata. This relaxes the constraints and we should probably just use protobuf by...
We have a known fix to avoid constructing the full scan plan up-front, but seems like the other metrics are a little slow too. Do you mind sharing some properties...
We use spawn as this follows DataFusion's CPU scheduling logic and makes most sense. spawn_blocking isn't really for CPU heavy workloads. It's for I/O bound but blocking workloads. In other...
🤷 it's a bit of both? > Tokio will spawn more blocking threads when they are requested through this function until the upper limit configured on the [Builder](https://dtantsur.github.io/rust-openstack/tokio/task/struct@crate::runtime::Builder) is reached....
Any reason the bloom filters aren't just a binary column in the zone map? In theory, we could write the zone map using a StructLayout such that only the required...