arrow-julia
arrow-julia copied to clipboard
Add RunEndEncoded Array Support
Implements support for Arrow's RunEndEncoded (REE) layout as specified in the Arrow format specification. REE is a run-length encoding variant that efficiently stores arrays with repeated values using two child arrays: run_ends (indices where runs terminate) and values (the actual run values).
Implementation
- Core type: Added Arrow.RunEndEncoded{T,R,A} struct with O(log n) binary search indexing
- Type system: Registered RunEndEncodedKind in ArrowTypes module
- Serialization: Implemented arrowvector() and makenodesbuffers!() for writing REE arrays
- Deserialization: Added build() function and juliaeltype() for reading REE arrays from Arrow IPC format
- Interoperability: Validated against PyArrow-generated test files (included as fixtures)
Testing
- Cross-language validation using PyArrow 20.0.0-generated test files
- Round-trip tests for various data types (integers, floats, strings, booleans, with nulls)
- Edge cases: single runs, alternating values, long runs
Closes #476