caffeine icon indicating copy to clipboard operation
caffeine copied to clipboard

Unnecessary fetches when using asynchronous bulk load

Open mudiobada opened this issue 1 year ago • 2 comments

For a cache backed by AsyncCacheLoader that

  • loads values for keys that were not specifically requested ("pre-fetch")
  • and where getAll() is called in close successions.

Observed behavior:

Depending on timing, asyncLoadAll() may sometimes be called unnecessarily for already pre-fetched items.

A reproducer can be provided if needed (that injects artificial delays in the map returned by asyncLoadAll()). The following notes on LocalAsyncCache should however explain the behavior:

  1. Completion of getAll() depends only on items that were actually requested
 return composeResult(futures);

https://github.com/ben-manes/caffeine/blob/v3.1.8/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalAsyncCache.java#L149

  1. AsyncBulkCompleter completes futures in the following order - first requested items, then pre-fetched items:
        fillProxies(result);
        addNewEntries(result);

https://github.com/ben-manes/caffeine/blob/v3.1.8/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalAsyncCache.java#L260

The above results in behavior subject to timing between completion of addNewEntries(result) and when the next getAll() is triggered.

If getAll() is triggered again after the Future returned by a previous getAll() completed, but before "pre-fetched" items are added to the cache by addNewEntries(result), then the second call of getAll() could re-trigger a fetch of items that were already fetched (but not yet added to the cache.

Assumption is that getAll() completion immediately after requested items are added to the cache (by fillProxies(result)) is an optimization.

For a scenario where asyncLoadAll() might be very expensive (as it is in my use case), this optimization is not the appropriate trade-off.

Question:

Would it be possible to add an option that allows completion of getAll() only after all fetched items have been added to the cache? The effect of this should be the same as swapping the order of fillProxies(result); and addNewEntries(result); in AsyncBulkCompleter.

mudiobada avatar Dec 20 '23 09:12 mudiobada

I think it would be fine to reorder those lines or await on the whenComplete future as well, as a refinement to the behavior. It makes sense that a caller might think that all of those unrequested keys were stored by the time their future completes.

ben-manes avatar Dec 20 '23 09:12 ben-manes

It would be great if behavior could generally be changed to guarantee completion. A wait on completion of both futures and proxies in getAll() would surely be easier to reason about and guard against regression in later releases.

mudiobada avatar Dec 20 '23 11:12 mudiobada

I ran into the same issue and thought I'd try my hand at the refinement with #1550.

allanrodriguez avatar Feb 25 '24 01:02 allanrodriguez

Thanks! I'll take a look. There are a few things that I want to review while we're looking at this functionality.

  1. Can we write tests to assert this behavior to avoid regressions?
  2. What happens if the cancels, completes, or times out the returned future?
  3. Similarly, is there any error case due to the chaining that we need to be careful about?

I think your test case change was for 1. Offhand I don't recall if 2 & 3 were handled very well presently.

ben-manes avatar Feb 25 '24 02:02 ben-manes