Ben Chambers
Ben Chambers
Specifically, `Field` and `Fields` were added and put behind `Arc` to avoid cloning: https://github.com/apache/arrow-rs/issues/3955
FWIW: This relates at least partially to specialization and efficiency of the inner loops. I suspect there are ways to use some generic parameters to still get specialization, but some...
Some of this may be done as part of building the new partitioned execution logic (as part of #409).
In general -- started working on this to allow operating on many and/or large files without filling up the disk. First PR(s) are ready for review. @epinzur re `getMetadat()` and...
Capturing some links / thoughts: * Example of getting the minimum/maximum time from the parquet metadata (the file stats): https://github.com/kaskada-ai/kaskada/blob/7858a62bc26c4ffd2451336d6d4dee82bd393fab/crates/sparrow-runtime/src/metadata/prepared_metadata.rs#L45 * Fetching the schema is likely done by https://github.com/kaskada-ai/kaskada/blob/main/crates/sparrow-runtime/src/metadata/raw_metadata.rs
I don't know that I like `concat` as an aggregation. Specifically, the default behavior (concatenate all the items) is decidedly degenerate. I think we may be better off with something...
There are a lot of places we refer to the key columns (`_time`, etc.) which would benefit from having a single static FieldRef we could reference in all those places.
Created a repro repository. The workflow is setup to use skip jobs. It ignores changes to `README.md` but *always* fail to run tests (`false`). https://github.com/bjchambers/skip-actions-repro/blob/main/.github/workflows/workflow.yml I made a PR that...
Arrow recently added [`Scalar`](https://docs.rs/arrow/latest/arrow/array/struct.Scalar.html) which simplifies many of the APIs handling scalar values. We could potentially use this to replace our [`ScalarValue`](https://github.com/kaskada-ai/kaskada/blob/main/crates/sparrow-arrow/src/scalar_value.rs) enum. See https://github.com/apache/arrow-rs/pull/4793/files for some notes: 1. We...
@jordanrfrazier FYI -- I think we should be able to completely replace our ScalarValue enum with `Scalar`. I think it's probably worth doing that for how we serialize scalars in...