seafowl
seafowl copied to clipboard
Caching object store improvements: chunk coalescing
This PR introduces a new caching mechanism for fetching byte ranges:
- If chunk not present in cache, greedily extend the range to fetch by coalescing adjacent chunks that are also missing from the cache, so as to minimize the number of outgoing requests.
- For each such chunk put a
Pending
cache value, containing a channel over which the actual bytes will be sent. - If a task runs into the
Pending
value, it takes the receiver and waits on it with a timeout. - Once no more chunks can be coalesced issue a get_range request for the extended range.
- If that errors out send the error value to any awaiting task, otherwise cache the bytes as
Memory
variant and trigger persisting to disk (File
value variant).
The timeout is important to break some hanging issues I've noticed (notably, TPC-H SF10 q18 and q21).