guava icon indicating copy to clipboard operation
guava copied to clipboard

Cache's compute misreports the removal cause of a collected entry

Open ben-manes opened this issue 3 months ago • 7 comments

Guava Version

33.4.9-SNAPSHOT

Description

I tried to run the pre-release in Caffeine's test suite to verify the recent fix for cache computations (thanks @eamonnmcmanus). Other than different interpretations of stats this is pretty good, except that it uncovered a mistaken assumption about reference clean up. Here is the github action run.

After the compute locks it runs preWriteCleanup() which performs an amortized drain of the pending work. However it does not perform all that might have built up since it is on a caller's thread, e.g. a large cache might evacuate a lot of data. After this is performed, an existing entry may be found that is pending eviction but still present. In this test it returns null, so which leads to the mapping removal, and the listener reports this as an EXPLICIT removal cause with a null value.

/**
 * Maximum number of entries to be drained in a single cleanup run. This applies independently to
 * the cleanup queue and both reference queues.
 */
// TODO(fry): empirically optimize this
static final int DRAIN_MAX = 16;

@GuardedBy("this")
void drainValueReferenceQueue() {
  Reference<? extends V> ref;
  int i = 0;
  while ((ref = valueReferenceQueue.poll()) != null) {
    @SuppressWarnings("unchecked")
    ValueReference<K, V> valueReference = (ValueReference<K, V>) ref;
    map.reclaimValue(valueReference);
    if (++i == DRAIN_MAX) {
      break;
    }
  }
}

The null value in the removal notification should only occur for a COLLECTED cause, so the test suite notices this during the validation and Guava logs the rejection as Exception thrown by removal listener.

java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:222)
	at com.github.benmanes.caffeine.cache.testing.RemovalListeners.validate(RemovalListeners.java:53)
	at com.github.benmanes.caffeine.cache.testing.RemovalListeners$ConsumingRemovalListener.onRemoval(RemovalListeners.java:90)
	at com.github.benmanes.caffeine.cache.testing.GuavaCacheFromContext$GuavaRemovalListener.onRemoval(GuavaCacheFromContext.java:528)
	at com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1841)
	at com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3488)
	at com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3464)
	at com.google.common.cache.LocalCache$Segment.compute(LocalCache.java:2295)
	at com.google.common.cache.LocalCache.compute(LocalCache.java:4230)
	at com.github.benmanes.caffeine.cache.testing.GuavaCacheFromContext$GuavaCache$AsMapView.compute(GuavaCacheFromContext.java:334)
	at com.github.benmanes.caffeine.cache.ReferenceTest.compute_nullValue(ReferenceTest.java:1033)

Example

@Test(dataProvider = "caches")
@CacheSpec(population = Population.FULL, values = {ReferenceType.WEAK}, implementation = Implementation.Guava, //, ReferenceType.SOFT},
    expireAfterAccess = Expire.DISABLED, expireAfterWrite = Expire.DISABLED,
    maximumSize = Maximum.DISABLED, weigher = CacheWeigher.DISABLED,
    stats = Stats.ENABLED, removalListener = Listener.CONSUMING)
public void compute_nullValue(Map<Int, Int> map, CacheContext context) {
  Int key = context.firstKey();
  var collected = getExpectedAfterGc(context,
      Maps.filterKeys(context.original(), not(equalTo(key))));
  collected.add(new SimpleEntry<>(key, null));

  context.clear();
  GcFinalization.awaitFullGc();
  assertThat(map.compute(key, (k, v) -> {
    assertThat(v).isNull();
    return null;
  })).isNull();
  assertThat(context.cache()).whenCleanedUp().isEmpty();
  assertThat(context).notifications().withCause(COLLECTED)
      .contains(collected).exclusively();
}

Expected Behavior

In other logic, like lockedGetOrLoad, after the pre-write cleanup the entry found in the hash table is validated before usage.

if (value == null) {
  enqueueNotification(
      entryKey, hash, value, valueReference.getWeight(), RemovalCause.COLLECTED);
} else if (map.isExpired(e, now)) {
  // This is a duplicate check, as preWriteCleanup already purged expired
  // entries, but let's accommodate an incorrect expiration queue.
  enqueueNotification(
      entryKey, hash, value, valueReference.getWeight(), RemovalCause.EXPIRED);

Actual Behavior

FAILED: com.github.benmanes.caffeine.cache.ReferenceTest.compute_nullValue(AsMapView, CacheContext{population=FULL, maximumSize=DISABLED, weigher=DISABLED, expiry=DISABLED, expiryTime=FOREVER, afterAccess=DISABLED, afterWrite=DISABLED, refreshAfterWrite=FOREVER, keyStrength=WEAK, valueStrength=WEAK, compute=SYNC, loader=NEGATIVE, isAsyncLoader=false, cacheExecutor=DIRECT, cacheScheduler=DISABLED, removalListener=CONSUMING, evictionListener=DISABLED, initialCapacity=DEFAULT, stats=ENABLED, implementation=Guava, startTime=-2126573393550924486})
value of                    : cacheContext.context.removalListener
missing                     : {COLLECTED=[1000=null [COLLECTED]]}
---
expected to contain at least: {COLLECTED=[null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], 1000=null [COLLECTED]]}
but was                     : {COLLECTED=[null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED], null=null [COLLECTED]]}
cacheContext was            : CacheContext{population=FULL, maximumSize=DISABLED, weigher=DISABLED, expiry=DISABLED, expiryTime=FOREVER, afterAccess=DISABLED, afterWrite=DISABLED, refreshAfterWrite=FOREVER, keyStrength=WEAK, valueStrength=WEAK, compute=SYNC, loader=NEGATIVE, isAsyncLoader=false, cacheExecutor=DIRECT, cacheScheduler=DISABLED, removalListener=CONSUMING, evictionListener=DISABLED, initialCapacity=DEFAULT, stats=ENABLED, implementation=Guava, startTime=-2126573393550924486}
	at com.github.benmanes.caffeine.cache.ReferenceTest.compute_nullValue(ReferenceTest.java:1039)

Packages

com.google.common.cache

Platforms

No response

Checklist

  • [x] I agree to follow the code of conduct.

  • [x] I can reproduce the bug with the latest version of Guava available.

ben-manes avatar Sep 13 '25 22:09 ben-manes

Hi @chaoren I’d like to contribute this issue Is it okay if I take this issue and prepare an implementation?

bandalgomsu avatar Sep 26 '25 05:09 bandalgomsu

The issue occurs in the compute() method in LocalCache.java. After preWriteCleanup() drains the reference queues (potentially evicting entries), an existing entry might still be present in the map but with a null value (due to garbage collection). The current logic doesn't validate the entry post-cleanup like lockedGetOrLoad() does, leading to a removal with RemovalCause.EXPLICIT instead of RemovalCause.COLLECTED when the computed value is null. Am i right with the observation ?

Tanush-Jain avatar Oct 30 '25 06:10 Tanush-Jain

Would this approach of Adding entry validation in compute() after preWriteCleanup(), similar to the check in lockedGetOrLoad(). Specifically, if valueReference.get() == null and valueReference.isActive(), enqueue a collected notification and remove the entry before proceeding with the computation. This may ensure the removal cause is correct without changing broader logic. Is it suppose to be like this?

Tanush-Jain avatar Oct 30 '25 06:10 Tanush-Jain

I was planning to prepare a PR using an approach like this (in compute())

V oldValue = valueReference.get();

RemovalCause cause;
if (oldValue == null && valueReference.isActive()) {
    cause = RemovalCause.COLLECTED;
} else {
    cause = RemovalCause.EXPLICIT;
}

removeEntry(e, hash, cause);
return null;

It ensures that if the entry was already garbage-collected but still active, the removal cause is correctly reported as COLLECTED instead of EXPLICIT.

Thanks

bandalgomsu avatar Oct 30 '25 06:10 bandalgomsu

@bandalgomsu , Hey did you create a Pr for this?

Tanush-Jain avatar Oct 31 '25 08:10 Tanush-Jain

Since there hasn’t been any feedback on the comment, I haven’t opened a PR yet.

Would it be okay if I create a PR?

bandalgomsu avatar Oct 31 '25 11:10 bandalgomsu

@bandalgomsu Yes, you can if it passes all test cases and scenarios .

Tanush-Jain avatar Nov 01 '25 05:11 Tanush-Jain