datafusion-ballista
datafusion-ballista copied to clipboard
Consider using `with_skip_validation` for shuffle file reading
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Improve shuffle reading, with https://github.com/apache/arrow-rs/pull/7120 merged consider using it for shuffle files.
Describe the solution you'd like
Change shuffle reader to utilise new api.
Describe alternatives you've considered
Keep everything as it is
Additional context
- we could try to address #944 at the same time
/take
For references
- Benchmarks for Arrow IPC reader
- Benchmarks for Arrow IPC writer
- Add with_skip_validation flag to IPC StreamReader, FileReader and FileDecoder
WIP Improve
- Improve Spill Performance: mmap the spill files
- A collection of tickets for improving sorting larger than memory datasets / spilling sorts
Base on arrow-rs
Minimum arrow-rs release version 54.3.0 support with_skip_validation