aws-sdk-kotlin
aws-sdk-kotlin copied to clipboard
Memory Leak with DynamoDB because of KtorEngine
Describe the bug
There is a slow memory leak when using DynamoDB
The origin of the leak seems to reside with the KtorEngine, which is used by smithy's SdkHttpClient. I have opened a ticket with them: https://youtrack.jetbrains.com/issue/KTOR-4823/Memory-Leak-on-KtorEngine-
This is what we have been experiencing (and repeated several times)
We have an application on which we have ramped up shadow traffic to DynamoDB at a few hundred TPS and within about a day, the leak grows to ~850M of heap memory not being collected. Taking a Heap Dump, the DynamoDBClient is holding on to 90% of the heap memory: hundreds of thousands of objects from the kotlinx.coroutines package that are not released.

Expected behavior
We expect kotlin sdk to not leak memory
Current behavior
If you run a few hundred TPS for a day and take a heap dump, you will see that DynamoDBClient is leaking memory through the underlying KtorEngine

Steps to Reproduce
Run an application in kotlin making use of coroutines/suspend functions and writing/reading to DynamoDB using the DynamoDBClient at a rate of a few hundred TPS - within a few hours, take a heap dump and look at which object is holding on to memory in the heap. The DynamoDBClient should be up there with thousands of references on coroutines.
Possible Solution
Work to KTOR folks to fix it?|
I have seen a similar but not identical bug on their tracking system https://youtrack.jetbrains.com/issue/KTOR-4288
I have myself opened up this one: https://youtrack.jetbrains.com/issue/KTOR-4823/Memory-Leak-on-KtorEngine-
If you could make sure the issue has traction, it would be great!
Context
We are looking to use the Kotlin SDK in production as soon as we find it to be stable enough. We understand that right now, it is still in early release, but if it were not for the leak, the client for DynamoDB is behaving very well and we'd be willing to try it in a production environment.
AWS Kotlin SDK version used
0.15.0, 0.16.0, 0.17.0-beta
Platform (JVM/JS/Native)
Java 11 (Correto) - Kubernetes
Operating System and version
Docker, Java 11.0.14.1
Hi @vgiguere, thanks for the bug report. To confirm a few things:
The SDK's default HTTP engine was changed to be OkHttp (not Ktor) in 0.16.4-beta. We still provide a Ktor engine in the latest versions but it must be manually selected during client configuration. You mentioned that the issue occurs on 0.17.0-beta. When using that version, are you specifically configuring Ktor as an engine for DynamoDbClient? Does using the default OkHttp engine cause the same issue?
Apologies - I guess my browser filled in that version field from a previous issue I had submitted and I did not notice. The version we tested and profiled was 0.16.0 - I will test 0.16.4-beta and hopefully the problem is gone.
Thank you ;)
I cannot reproduce a strictly-increasing memory leak, even after running for several hours at high TPS. When taking periodic heap dumps, I do occasionally see 10K+ CombinedContext objects but subsequent dumps show far lower objects and memory usage. The memory usage/rentainment seems to fluctuate to a large degree, as I'd expect with highly-concurrent, high-bandwidth code.
@vgiguere Have you taken multiple heap dumps over the lifecycle of your application? Do they always show increasing (vs decreasing) utilization of CombinedContext?
It's also possible the concurrency method my test code uses differs significantly from your own. Can you provide minimal sample code which reproduces the problem?
Lastly, my test code ran with default JVM settings under OpenJDK 11. Can you confirm any non-default JVM settings you're using, particularly that might affect memory, threading, or garbage collection?
Thank you.
It looks like this issue has not been active for more than 5 days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.