vortex
vortex copied to clipboard
A toolkit for working with compressed array data
Currently at 80 bytes, mostly because it stores the dtype
Currently at 48 bytes, could go down to 16 reasonably
We should have: • LocalTime ([time unit] after midnight) - Arrow time32 or time64 • LocalDate (julian day) - Arrow date32 • LocalDateTime (julian day and [time unit] after midnight)...
Should be computed separately from existing stats to avoid expensive overheads when not performing compression.
It's misleading to have `new()` call `try_new().unwrap()` since it obscures the lack of safety. I'd propose we have a `new_unchecked` and `new() Result` where unchecked can be used as an...
Is it worth performing two passes to build a sorted dictionary to encode values with? There's likely cases where having sorted dictionary proves beneficial but not sure of tradeoffs
The stats of a compressed array should equal the stats of the uncompressed array. Further, the stats from the compressed array can be used to populate the stats of compressed...
Since dtype is logical type, we should distinguish between uint8 and a byte (with underlying u8 ptype). This will allow us to perform different compression strategies. e.g. not much point...
As a chunked array loops over its chunks, the compressor could pick the most recently used compression scheme provided the resulting compression ratio remains within some bound. This would short-cut...
For floats that fail ALP encoding (e.g., logical "real" numbers rather than fixed-point decimals stored as FP)