vortex
vortex copied to clipboard
Tracking Issue: Fixed-Size List Arrays
Tracking Issue: Fixed-Size List Arrays
This is a tracking issue for adding fixed-size list arrays to Vortex.
The main difference between lists and fixed-size lists is that fixed-size lists do not need to store an offsets buffer, since we are able to use the predictable stride pattern to jump to data. This allows us to more efficiently store data with known size (like matrices or tensors) as well as compute more efficiently with SIMD.
Public API
pub enum DType {
/// A logical fixed-size list type.
///
/// This is parameterized by a `DType` that represents the element type of the inner lists, as
/// well as a `u32` size that determines the fixed length of each `FixedSizeList` scalar.
FixedSizeList(Arc<DType>, u32, Nullability),
}
#[cfg(not(target_arch = "wasm32"))]
const_assert_eq!(size_of::<DType>(), 16);
#[cfg(target_arch = "wasm32")]
const_assert_eq!(size_of::<DType>(), 12); // Increase size to 12 bytes because of the `u32` field.
Prior Art
- Apache Arrow Fixed-Size List Overview: https://arrow.apache.org/docs/format/Intro.html#fixed-size-list
- Apache Arrow Fixed-Size List Layout: https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout
- Apache Arrow Variable-Size List Layout: https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout
- Apache Arrow Fixed-shape Tensor: https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor
- Arrow Rust
FixedSizeListArray: https://docs.rs/arrow/latest/arrow/array/struct.FixedSizeListArray.html pyarrow.FixedSizeListType: https://arrow.apache.org/docs/python/generated/pyarrow.FixedSizeListType.html- DuckDB Array Type (the same as fixed-size list): https://duckdb.org/docs/stable/sql/data_types/array.html
- Adding a new variant (
DType::Decimal) to Vortex: https://github.com/vortex-data/vortex/pull/3058 - Conversion to numpy array potential optimizations: https://github.com/apache/arrow/issues/ 35622 (so we don't ping that issue)
Steps / History
- [x] https://github.com/vortex-data/vortex/pull/4371
- [x] https://github.com/vortex-data/vortex/pull/4380
- [x] https://github.com/vortex-data/vortex/pull/4385
- [X] https://github.com/vortex-data/vortex/pull/4405
- [X] https://github.com/vortex-data/vortex/pull/4428
- [x] https://github.com/vortex-data/vortex/pull/4495
- [x] https://github.com/vortex-data/vortex/pull/4541
- [x] https://github.com/vortex-data/vortex/pull/4545
- [x] https://github.com/vortex-data/vortex/pull/4590
- [x] https://github.com/vortex-data/vortex/pull/4592
- [x] https://github.com/vortex-data/vortex/pull/4601
- [x] https://github.com/vortex-data/vortex/pull/4610
- [x] https://github.com/vortex-data/vortex/pull/4618
Unresolved Questions
- [ ] Is it okay to increase the size of
DTypeto 12 bytes on WASM (instead of the previous size of 8)? Or instead have anFixedListElementTypethatArcs both theDTypeand theu32size? If we do that thenListshould also be changed - [ ] Is
u32the correct type to use here (Arrow usesi32for Java)
FixedSizedList doesn't actually increase DType size (which was already 16 and remains 16)
It has increased it for wasm from 8 to 12. Though don’t think it matters