fast-cache icon indicating copy to clipboard operation
fast-cache copied to clipboard

Cache key gets stuck for ever

Open troepolik opened this issue 3 months ago • 6 comments

We have noticed that some times cache key gets stuck in cache for ever like I set expiration = infinite. Helps only restart pod.

I am not sure is it related, but we also use remove method (for hot reload feature when cache key should be expired by domain event)

if (Cached<TValue>.TryGet(internalKey, out var cacheValue))
    cacheValue.Remove();

For now we noticed this problem only with types that has hot reload feature, so I guess it mb related

Cache key type in our case (if it is important): record struct InternalCacheKey<TKey>(TKey Key, string MethodName, string FilePath); TKey is (short id, short langId) but we had same issue with different keys.

expirationTime = TimeSpan.FromSeconds(600) (10 minutes)

I tried to reproduce it but had no success. I also happens not so often in production, but brings a lot of problems. Mb you have any idea of reason of this bug?

We use only 3 methods: Cached<TValue>.TryGet (for one key) Cached<TValue>.Save (for one key) cacheValue.Remove()

troepolik avatar Mar 20 '24 15:03 troepolik

Hi, this sounds like something that may occur with struct tearing (observing partial updates of cache item since it is a struct that holds TValue and long expiration timestamp), but in practice it should not be possible to reach since the dictionary implementation should be swapping the node reference rather than contents.

Could you give more details on hot reload feature you are using in this context?

neon-sunset avatar Mar 20 '24 15:03 neon-sunset

Hi, hot reload is just listening some domain events and remove key from cache by call Remove to force next reading request to reread it from db and save to cache. Example of removing by event for one of key:

//short id, short langId - we've got from event data
var internalKey = new InternalCacheKey((id, langId), methodName, filePath); //build key to find it in cache
if (Cached<TValue>.TryGet(internalKey, out var cacheValue))
    cacheValue.Remove();

troepolik avatar Mar 21 '24 06:03 troepolik

A little update: I've got dump from pod with problem. And I've found corresponding entry in cache dictionary _entries. And honestly I don't see any problem - this entry marked with TOMBSTONE (static value) in value. So, It is "deleted" value. And I locally make the same value in the cache and everything works well- cache just return false by TryGet method. So, for now I have no idea how the cache return some value from cache)) I suspected there is another entry instance with the same key, but no - only one in dump. And also only one instance of entries array and only one instance of DictionaryImpl.

Here entry value from dump:

{
      "hash": -159001085,
      "key": {
        "@ref": "0x00007fd2f1da13d0",
        "@type": "NonBlocking.Boxed<FastMemoryCache+InternalCacheKey<ValueTuple<Int16, Int16>>>",
        "writeStatus": 0,
        "Value.<Key>k__BackingField.Item1": 118,
        "Value.<Key>k__BackingField.Item2": 36,
        "Value.<MethodName>k__BackingField": "someMethodName",
        "Value.<FilePath>k__BackingField": "/__w/1/s/somePathToFile.cs"
      },
      "value": {
        "@ref": "0x00007fd2dd237e68",//address of TOMBSTONE
        "@type": "System.Object",
        "@comment": "written above"
      }
 }

0x00007fd2dd237e68 is address of TOMBSTONE - I've understand it because it is empty object and I've found the same address in PRIME object instance:

{
  "@ref": "0x00007fd2dd237e80",
  "@type": "NonBlocking.DictionaryImpl+Prime",
  "originalValue": {
    "@ref": "0x00007fd2dd237e68",
    "@type": "System.Object"
  }
}

For now I couldn't understand reason but I'l continue investigation. Mb next time I should get dump with type="WithHeaps" instead of "Full" to be able to debug it.

troepolik avatar Apr 10 '24 08:04 troepolik

Interesting, and thanks for looking into this further. I completely forgot about this issue but will look into NonBlocking dictionary impl. again on the off chance it has a race condition or a logic bug that may lead to such a scenario. Indeed, TOMBSTONE entry should never be returned...

Going back to the issue description and to clarify - in which way the cache item being stuck manifests in your application code? Just TryGet returning true or something else?

neon-sunset avatar Apr 10 '24 12:04 neon-sunset

Yes, look like TryGet returning true. Because value in db was updated and other pods started return new value. But one pod returns old one, even after few hours. expirationTime = TimeSpan.FromSeconds(600) (10 minutes).

troepolik avatar Apr 10 '24 13:04 troepolik

Even if TOMBSTONE returned I could not imagine how it could be transformed to our cached object. We use object just after getting it from cache and there is no exception (like nullreference ).

Thanks for your attention)

troepolik avatar Apr 10 '24 13:04 troepolik