Kache icon indicating copy to clipboard operation
Kache copied to clipboard

InMemoryKache::getOrPut fails in high concurrency context

Open vfsfitvnm opened this issue 1 year ago • 4 comments

Hi :smile_cat: I noticed I was getting NullPointerExceptions when using SHA256KeyHasher. I could narrow it down to InMemoryKache - the following test fails:

    @Test
    fun concurrency() = runTest {
        val cache = InMemoryKache<String, String>(maxSize = 1)

        (0..100000).map(Int::toString).forEach {
            launch { cache.getOrPut(it) { it }!! }
        }
    }

Stack trace:

java.lang.NullPointerException
	at com.mayakapps.kache.InMemoryKacheTest$concurrency$1$2$1.invokeSuspend(InMemoryKacheTest.kt:36)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.test.TestDispatcher.processEvent$kotlinx_coroutines_test(TestDispatcher.kt:24)
	at kotlinx.coroutines.test.TestCoroutineScheduler.tryRunNextTaskUnless$kotlinx_coroutines_test(TestCoroutineScheduler.kt:99)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$2$1$workRunner$1.invokeSuspend(TestBuilders.kt:322)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:277)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:95)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at kotlinx.coroutines.test.TestBuildersJvmKt.createTestResult(TestBuildersJvm.kt:10)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0(TestBuilders.kt:310)
	at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0(Unknown Source)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0(TestBuilders.kt:168)
	at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0(Unknown Source)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0$default(TestBuilders.kt:160)
	at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0$default(Unknown Source)
	at com.mayakapps.kache.InMemoryKacheTest.concurrency(InMemoryKacheTest.kt:32)

vfsfitvnm avatar Jun 08 '24 10:06 vfsfitvnm

We are seeing similar NPEs in this code.

Uncaught Kotlin exception: kotlin.NullPointerException
    at 0   AppEcc                              0x10587ad97        kfun:kotlin.Throwable#<init>(){} + 95 
    at 1   AppEcc                              0x105874167        kfun:kotlin.Exception#<init>(){} + 87 
    at 2   AppEcc                              0x105874387        kfun:kotlin.RuntimeException#<init>(){} + 87 
    at 3   AppEcc                              0x1058745a7        kfun:kotlin.NullPointerException#<init>(){} + 87 
    at 4   AppEcc                              0x1058af41f        ThrowNullPointerException + 131 
    at 5   AppEcc                              0x105d8f4d7        kfun:com.mayakapps.kache.SHA256KeyHasher.$transformCOROUTINE$0.invokeSuspend#internal + 783 
    at 6   AppEcc                              0x1059afd1f        kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline + 67 

SHA256KeyHasher has the following code:

    override suspend fun transform(oldKey: String): String = hashedCache.getOrPut(oldKey) {
        oldKey.encodeUtf8().sha256().hex().uppercase()
    }!! // Since our creation function never returns null, we can use not-null assertion

Notice the comment at the end.

Now look at the following:

    override suspend fun getOrPut(key: K, creationFunction: suspend (key: K) -> V?): V? {
        get(key)?.let { return it }

        creationMutex.withLock {
            if (creationMap[key] == null && map[key] == null) {
                @Suppress("DeferredResultUnused")
                internalPutAsync(key, creationFunction) // <!--- ASYNC
            }
        }

        return get(key) // <!-- DID ASYNC ABOVE FINISH?
    }

I know nothing about coroutines in Kotlin, but I see an async call that I'm guessing is not guaranteed to finish before the return get call at the end of the method, resulting in a nasty race condition where sometimes you'll get a value and other times you will not. This seems extremely bad.

paulrs avatar Jul 18 '24 14:07 paulrs

I was able to replicate the crash using the unit test provided above:

    @Test
    fun concurrency() = runTest {
        val cache = InMemoryKache<String, String>(maxSize = 1)

        (0..100000).map(Int::toString).forEach {
            launch { cache.getOrPut(it) { it }!! }
        }
    }

I found that by replacing the second call to get(key) in the getOrPut(…) function with a new function I called unsafeGet(key: K): V, the problem goes away. This is my unsafeGet(…) implementation:

private suspend fun unsafeGet(key: K): V = (getFromCreation(key) ?: getIfAvailable(key))!!

Since unsafeGet(…) is a copy of get(…), but with evictExpired() removed (and a force unwrap added), this suggests to me that evictExpired() may be triggering a race condition. However, my Kotlin experience is limited, so my confidence in this analysis is not particularly high.

As an aside, by calling get(…) twice, evictExpired() is being called twice. Is this really necessary in a single call to getOrPut(…)? Calling evictExpired() twice seems superfluous and possibly a performance issue.

djehrlich avatar Jul 19 '24 21:07 djehrlich

Thank you for reporting the issue and investigating it, and sorry for being late.

I would like to further investigate and try to fix the issue in the next few days.

MSDarwish2000 avatar Sep 21 '24 22:09 MSDarwish2000

Any update on this after several months?

jimirocks avatar Jun 09 '25 05:06 jimirocks