caffeine
caffeine copied to clipboard
Unnecessary fetches when using asynchronous bulk load
For a cache backed by AsyncCacheLoader that
- loads values for keys that were not specifically requested ("pre-fetch")
- and where getAll() is called in close successions.
Observed behavior:
Depending on timing, asyncLoadAll() may sometimes be called unnecessarily for already pre-fetched items.
A reproducer can be provided if needed (that injects artificial delays in the map returned by asyncLoadAll()). The following notes on LocalAsyncCache should however explain the behavior:
- Completion of getAll() depends only on items that were actually requested
return composeResult(futures);
https://github.com/ben-manes/caffeine/blob/v3.1.8/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalAsyncCache.java#L149
- AsyncBulkCompleter completes futures in the following order - first requested items, then pre-fetched items:
fillProxies(result);
addNewEntries(result);
https://github.com/ben-manes/caffeine/blob/v3.1.8/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalAsyncCache.java#L260
The above results in behavior subject to timing between completion of addNewEntries(result) and when the next getAll() is triggered.
If getAll() is triggered again after the Future returned by a previous getAll() completed, but before "pre-fetched" items are added to the cache by addNewEntries(result), then the second call of getAll() could re-trigger a fetch of items that were already fetched (but not yet added to the cache.
Assumption is that getAll() completion immediately after requested items are added to the cache (by fillProxies(result)
) is an optimization.
For a scenario where asyncLoadAll() might be very expensive (as it is in my use case), this optimization is not the appropriate trade-off.
Question:
Would it be possible to add an option that allows completion of getAll() only after all fetched items have been added to the cache?
The effect of this should be the same as swapping the order of fillProxies(result);
and addNewEntries(result);
in AsyncBulkCompleter.
I think it would be fine to reorder those lines or await on the whenComplete
future as well, as a refinement to the behavior. It makes sense that a caller might think that all of those unrequested keys were stored by the time their future completes.
It would be great if behavior could generally be changed to guarantee completion.
A wait on completion of both futures
and proxies
in getAll() would surely be easier to reason about and guard against regression in later releases.
I ran into the same issue and thought I'd try my hand at the refinement with #1550.
Thanks! I'll take a look. There are a few things that I want to review while we're looking at this functionality.
- Can we write tests to assert this behavior to avoid regressions?
- What happens if the cancels, completes, or times out the returned future?
- Similarly, is there any error case due to the chaining that we need to be careful about?
I think your test case change was for 1. Offhand I don't recall if 2 & 3 were handled very well presently.