vortex
vortex copied to clipboard
Better IO orchestration
Vortex reader will collect all read requests from layouts and dispatch them together https://github.com/spiraldb/vortex/blob/develop/vortex-serde/src/layouts/read/stream.rs#L192. However, this is extremely naive and doesn't leverage additional knoweldge we have about file format to prioritize requests and prefetch data.
AnyBlob outlines techniques for improving throughput and latency on blob stores. We should explore methods outlined in the paper and try to incorporate them into vortex file reading.
In no particular order the things to look at are
- Better queuing, we shouldn't wait for arbitrary chunk to finish reading if we have chunks we can process already
- Performing io via io_uring (on linux) and minimise copies
- On disk caching to allow deduplication across files. We could checksum vortex batches and deduplicate loading across files using on disk cache