opentelemetry-java-instrumentation icon indicating copy to clipboard operation
opentelemetry-java-instrumentation copied to clipboard

TracingSubscriber excessive memory usage - 81% memory of io.lettuce.core.RedisPublisher$SubscriptionCommand

Open SimoneGiusso opened this issue 1 year ago • 3 comments

Describe the bug

I won't called it a bug but is something unexpected. Under high load our reactive service get OOM. The cause is the CommandHandler#stack lettuce-core deque (please read all to find out why I opened the issue here). I've already read this and the possibility to bound the queue. The service won't go OOM but still an exception will be thrown.

I have a blocking version of the same service and, under the same load, it doesn't go OOM. Probably because the concurrent request are capped to 200 (Tomcat default). Anyway having a reactive stack that doesn't handle the same load of a non reactive stack is something unexpected right?

I had a look at the thread dump:

Screenshot 2024-02-05 at 10 19 11

this variable is taking almost 500MB of memory (~80% of the total available memory). Looking at the command stored shows that:

Screenshot 2024-02-05 at 10 21 49

TracingSubscriber is making around 81% of the total command memory - 85KB. I opened an issue here and not on lettuce-core because the problem here seems the excessive amount of memory this object is taking in comparison to the command itself.

I perform the same load test without running the java opentelemetry agent and indeed the services didn't go OOM.

Steps to reproduce

Run load test on a instrumented reactive service using the reactive lettuce client.

Expected behavior

I'd expect less memory used by TracingSubscriber

Actual behavior

TracingSubscriber takes 81% of the totaly memory allocated for the redis command

Javaagent or library instrumentation version

1.29.0

Environment

lettuce-core:6.1.9

Additional context

No response

SimoneGiusso avatar Feb 05 '24 09:02 SimoneGiusso

Could you provide a sample application along with any instructions needed to reproduce the issue.

laurit avatar Feb 05 '24 12:02 laurit

Hello, as soon as I find the time I'll try to provide an example. Just as additional info, I tried limit the CommandHandler request queue size and even if the total CommandHandler size decreased there is still huge memory consumption (causing OOM), apparently still from Opentelemetry in other objects:

Screenshot 2024-02-06 at 09 39 33 Screenshot 2024-02-06 at 09 40 12

For now I just deactivated it.

SimoneGiusso avatar Feb 06 '24 08:02 SimoneGiusso

Hi @SimoneGiusso,

As @laurit mentioned, a reproducer app would be helpful. Or you may provide the heap dump file, so we can make our own analysis.

It is hard to say without having a heap dump file, but I think, most of the memory usage of the TracingSubscriber instances are occupied by io.opentelemetry.context.Context (for ex. io.opentelemetry.context.ArrayBasedContext) instances.

serkan-ozal avatar Mar 16 '24 19:03 serkan-ozal