Better IO orchestration

Open robert3005 opened this issue 1 year ago • 0 comments

Vortex reader will collect all read requests from layouts and dispatch them together https://github.com/spiraldb/vortex/blob/develop/vortex-serde/src/layouts/read/stream.rs#L192. However, this is extremely naive and doesn't leverage additional knoweldge we have about file format to prioritize requests and prefetch data.

AnyBlob outlines techniques for improving throughput and latency on blob stores. We should explore methods outlined in the paper and try to incorporate them into vortex file reading.

In no particular order the things to look at are

Better queuing, we shouldn't wait for arbitrary chunk to finish reading if we have chunks we can process already
Performing io via io_uring (on linux) and minimise copies
On disk caching to allow deduplication across files. We could checksum vortex batches and deduplicate loading across files using on disk cache

Oct 02 '24 16:10 robert3005