lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

Invalidation race causing stale cache for io.lettuce.core.support.caching.ClientSideCaching

Open IvarHenckel opened this issue 2 months ago • 3 comments

Bug Report

Current Behavior

Using a CacheFrontEnd created from io.lettuce.core.support.caching.ClientSideCaching.create(CacheAccessor, StatefulRedisConnection) . We have experienced the cache going stale, i.e. not updated even if writes are sent to Redis.

Identified why this happens. Modified the current implementation of ClientSideCaching.get().

    // Copied MapCacheAccessor and made two small adjustments needed below:
    // getRedisInProgress -> Just a temporary value signaling redis operation in progress. E.g. "REDIS_IN_PROGRESS", if V type is String.
    // putWithReturn -> Same as put, but return the previous value (atomically).

    @Override
    public V get(K key) {

        V value = cacheAccessor.get(key);

        if (value == null) {
            cacheAccessor.put(key, cacheAccessor.getRedisInProgressPlaceHolder()); // new
            value = redisCache.get(key);

            if (value != null) {
                final V beforePut = cacheAccessor.putWithReturn(key, value); // edited
                if (beforePut != cacheAccessor.getRedisInProgressPlaceHolder()) { // new
                    System.out.println("Issue detected! For key: " + key); // new
                } // new
            }
        }

        return value;
    }

Now there's a temporary value set in the cache until redis get is complete. However, sometimes the log above is triggered, meaning that the cache entry has been updated in between. This can happen because of an invalidation and it can lead to the cache going stale. Consider this scenario:

  1. get is called, no cache entry, so Redis get called. Tracking table on redis server updated.
  2. Thread executing in get() above is slowed down (e.g. thread preemption / contention on the cache used) before putting value into cache.
  3. Key is updated in redis. So an invalidation is fired. Thread executes in DefaultRedisCache.addInvalidationListener callback and cache entry is evicted.
  4. Thread that was slowed down in step 2 continues, and adds it's value to the cache.

We will end up with a stale cache. Cause the cache has an entry for this key so we will never request it again, unless the value is evicted. But it will never be evicted cause in the view of the redis server the invalidation was after so it's already considered invalidated. The log get's triggered for me, and I verifed that when I update this keys value in redis, our service calling get on the cache still gets the old value.

Environment

  • Lettuce version(s): 6.4.2 (but latest code in question looks the same)
  • Redis version: 6.2.6

Possible Solution

If the key is evicted when the race is detected in the code snippet above it should be resolved. I.e. evict where the log is currently. All null value checks should probably also consider the "getRedisInProgressPlaceHolder" value as null.

Additional context

In our case, we were hitting the limit of keys in tracking table (tracking-table-max-keys), so invalidates due to full table were frequent. Therefore I could reproduce frequently. This is of course not the recommended environment. But nontheless, the race above would be possible with just normal writes.

IvarHenckel avatar Oct 21 '25 14:10 IvarHenckel

What do you think about adding a compareAndSwap() (or compareAndReplace()) method to CacheAccessor?

It would allow ClientSideCaching to perform atomic conditional puts and prevent stale cache insertions caused by invalidation races.

bandalgomsu avatar Oct 22 '25 00:10 bandalgomsu

That sounds good to me. Should I make a PR? I wouldn't mind doing that. Or are you already working on a solution?

IvarHenckel avatar Oct 22 '25 09:10 IvarHenckel

Folks, the team would like to focus on deprecating the current solution in favor of #3289

Since we can't really commit to a timeline right now we are willing to consider contributions to the old logic.

However please have in mind that this is considered with lower priority.

tishun avatar Oct 31 '25 11:10 tishun