stretto icon indicating copy to clipboard operation
stretto copied to clipboard

Potential race condition with insert-wait-get

Open blind-oracle opened this issue 8 months ago • 2 comments

Hi folks, not sure it's a bug but I can't figure out otherwise.

We got a cache based on stretto (sync API) in our service with simple semantics:

  • try_insert_with_ttl() then wait()
  • get() to fetch the value

I have a test that does simple insert/wait/get sequence to check that given entry exists in cache and in our CI/CD (bazel) this test sometimes fails - get() reports that the key is missing. Problem is that I cannot reproduce this locally - it has 100% success rate even if I run it thousands of times.

I am creating a cache with a large enough max_cost and using TTL of 3600s to make sure it won't be evicted.

Would be grateful for any hint on how to debug this, maybe I'm doing something wrong. But it seems consistent with code in https://github.com/al8n/stretto/blob/main/examples/sync_example.rs

blind-oracle avatar Oct 16 '23 12:10 blind-oracle

Hi, I failed to reproduce it on my machine, but this may be because the current implementation will first push new entry to a write buffer and then add it to the map (if the write buffer is full, then some inserts will be directly dropped). Can you give me your test code to help me reproduce the problem?

al8n avatar Oct 17 '23 15:10 al8n

@al8n Thanks for the effort. Probably my code won't help as I can't reproduce it too when running locally and not in Bazel. It's hard for me to tell how these environments are different, should be the same (and we have thousands of other tests which run fine). But there should be some subtle difference that causes this...

Effectively I'm doing the simple thing that I wrote initially - insert the value with some key, then immediately (well, after wait) check if it's there. And this sometimes gives me a cache miss. When the value is inserted the cache is empty, just created, so it shouldn't be dropped.

Maybe there's some initialization phase after the Cache object is created using CacheBuilder.finalize() (threads are spawned etc)? Though it does not explain why it does not fail locally.

I've switched the cache now to use async API of stretto, will check if that will cause same issues...

blind-oracle avatar Oct 17 '23 15:10 blind-oracle