seafowl icon indicating copy to clipboard operation
seafowl copied to clipboard

Caching object store improvements: chunk coalescing

Open gruuya opened this issue 10 months ago • 0 comments

This PR introduces a new caching mechanism for fetching byte ranges:

  • If chunk not present in cache, greedily extend the range to fetch by coalescing adjacent chunks that are also missing from the cache, so as to minimize the number of outgoing requests.
  • For each such chunk put a Pending cache value, containing a channel over which the actual bytes will be sent.
  • If a task runs into the Pending value, it takes the receiver and waits on it with a timeout.
  • Once no more chunks can be coalesced issue a get_range request for the extended range.
  • If that errors out send the error value to any awaiting task, otherwise cache the bytes as Memory variant and trigger persisting to disk (File value variant).

The timeout is important to break some hanging issues I've noticed (notably, TPC-H SF10 q18 and q21).

gruuya avatar Apr 15 '24 10:04 gruuya