micronaut-core
micronaut-core copied to clipboard
Memory leak on Micronaut HTTP server
Expected Behavior
No memory leak.
Actual Behaviour
Heap histograms show a potential memory leak. The following part of the heap histograms are relevant:
1: 57108507 1370604168 io.micronaut.core.execution.DelayedExecutionFlowImpl$Map
2: 57108505 1370604120 io.micronaut.core.execution.DelayedExecutionFlowImpl$OnErrorResume
3: 38072339 913736136 io.micronaut.core.execution.DelayedExecutionFlowImpl$FlatMap
4: 19036169 456868056 io.micronaut.core.execution.DelayedExecutionFlowImpl$OnComplete
There are more than 50 millions of io.micronaut.core.execution.DelayedExecutionFlowImpl$Map in memory! And this heap histogram is on an application with very few request (so there cannot be 50 millions of file currently uploading).
I think it may be related to this endpoint that bind one part with @Part Publisher<StreamingFileUpload> files then use a raw HttpRequest<?> inputs as we have parts both as files and String attributes.
@ExecuteOn(TaskExecutors.IO)
@Post(uri = "/{namespace}/{id}", consumes = MediaType.MULTIPART_FORM_DATA)
public Execution create(
@Parameter(description = "The inputs") HttpRequest<?> inputs,
@Parameter(description = "The inputs of type file") @Nullable @Part Publisher<StreamingFileUpload> files
) throws IOException {
Map<String, Object> inputMap = (Map<String, Object>) inputs.getBody(Map.class).orElse(null);
// do something with the files ..
}
The memory leak is new in Micronaut 4, in Micronaut 3 we bind multiple times the body, once in a part as today, and once in an @Body Map<String, Object> inputMap which is no more possible in Micronaut 4.
Reference GitHUb discussion: https://github.com/micronaut-projects/micronaut-core/discussions/10662GitHub
Steps To Reproduce
No response
Environment Information
- Operating System: Linux 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 5 21:19:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Lin
- Java: OpenJDK Runtime Environment Temurin-17.0.8.1+1 (build 17.0.8.1+1
Example Application
No response
Version
4.3.4
I don't know if it is of any help but I notice on an heap dump that it appears that in the DelayedExecutionFlowImpl there is a head attribute which contains a next attribute which contains a next attribute... recursively without apparent ends, looks like all the DelayedExecutionFlowImpl are next of a parent one ...
cc @yawkat
More information to help diagnose the issue.
A single StreamingByetBody is handling 6 millions DelayedExecutionFloImplt$OnErrorResume objects into a RequestLifecycle lambda retaining 1.6GB.
Just raw information, our whole application is broken due to this memory leak and customers and users are complaining, we try multiple workaround with no success 😭 We also try to make a PR, but definitely http server part are really complex for new comers. If you have any workaround advice, it will be awesome
@tchiotludo please give us some way to reproduce this. The form/multipart code is very complex and I don't see a starting point for debugging here
@yawkat it's very problematic as I didn't succeed in reproducing the problem.
That's why I added as much information as I could; users seem to not using form/multipart that much, and the memory leak points to RequestLifecycle so I'm not sure it is linked to form/multipart at all.
I can ask if I can share the dump if you want, but as a memory dump can contain sensitive data, I need to check first with the user and share it privately.
I can ask our users to provide more information but creating a reproducer seems to be very complex.
you could try setting micronaut.server.netty.server-type: full_content
Thanks @yawkat we will test it, meanwhile I'll try my best to make a reproducer
Hello. We don't use multipart data at all. Recently I've deployed a new service that answers only health checks, promethus metrics, and rare POSTs with data to store it mongo. It is a very simple micro so I gave 0.5 Gb of RAM to it and I see 1 per day or 2 days OOM there We use MN 4.2.0, netty, NO GraalVM, and Project Reactor everywhere I'll try to investigate a bit deeper later
@yawkat we cannot use micronaut.server.netty.server-type: full_content it crash for all requests with:
2024-04-09 11:33:36,466 WARN default-nioEventLoopGroup-1-3 io.netty.channel.ChannelInitializer Failed to initialize a channel. Closing: [id: 0x646fd7cb, L:/[0:0:0:0:0:0:0:1]:8080 - R:/[0:0:0:0:0:0:0:1]:48850]
java.lang.IllegalArgumentException: maxContentLength : -2147483648 (expected: >= 0)
at io.netty.util.internal.ObjectUtil.checkPositiveOrZero(ObjectUtil.java:144)
at io.netty.handler.codec.MessageAggregator.validateMaxContentLength(MessageAggregator.java:88)
at io.netty.handler.codec.MessageAggregator.<init>(MessageAggregator.java:77)
at io.netty.handler.codec.http.HttpObjectAggregator.<init>(HttpObjectAggregator.java:128)
at io.micronaut.http.server.netty.HttpPipelineBuilder$StreamPipeline.insertMicronautHandlers(HttpPipelineBuilder.java:608)
at io.micronaut.http.server.netty.HttpPipelineBuilder$StreamPipeline.insertHttp1DownstreamHandlers(HttpPipelineBuilder.java:638)
at io.micronaut.http.server.netty.HttpPipelineBuilder$ConnectionPipeline.configureForHttp1(HttpPipelineBuilder.java:380)
at io.micronaut.http.server.netty.HttpPipelineBuilder$ConnectionPipeline.initChannel(HttpPipelineBuilder.java:299)
at io.micronaut.http.server.netty.NettyHttpServer$Listener.initChannel(NettyHttpServer.java:892)
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:1130)
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:840)
@katoquro to check if it's the same issue you can try the following command to see if the same objects are accumulating:
jmap -histo:live <pid> | grep io.micronaut.core.execution
@loicmathieu only works if you lower your max-request-size to something that fits in memory.
@yawkat on user confirm that using the following configuration fixes the issue (or works around it):
configuration:
micronaut:
server:
max-request-size: 1GB
netty:
server-type: full_content
@yawkat with this configuration, file of more than 1GB lead to a request that seems to be "blocked forever" without an exception. So it's a workaround for some of our users but not a long term solution.
Do you still need a reproducer (I'm working on it but still didn't make it reproduce the issue)?
yes i still need a reproducer, either from you or from @katoquro .
full_content buffers the full request and bypasses most places that use DelayedExecutionFlow. but it's not recommended for permanent use.
@loicmathieu from the first glance, not my case. The micro is run for 5 hours
4245: 1 24 io.micronaut.core.execution.ImperativeExecutionFlowImpl
I will look for a leak in another place 🤔
@katoquro remove the grep and look at the most present objects in the histogram: jmap -histo:live <pid> you took multiple one and check which objects grow in number this could be an easy way to find a leak.
And if it's a different leak, better to open a new issue ;)
Can you analyze the memory dump and see what is being leaked? You can try https://eclipse.dev/mat/
@dstepanov @loicmathieu
I think my case is really different. I have next graph where green line is metrics about total committed heap provided by micronenter (sum of jvm_memory_committed_bytes) and yellow is the consumed memory by the java process taken from /proc/<pid>/stat
it's something out of heap, non-heap, etc... 🤔
@loicmathieu any luck on a reproducer?
@graemerocher unfortunately, no, that's why I added as much information as I could have.
@loicmathieu is there a way to run a Kestra locally to reproduce?
@graemerocher yes, you can either run it from its repository or its docker image.
But what very annoy me is that I cannot reproduce it myself, some users report the issue, I tried to setup Kestra locally with the same configuration and use it with the same scenario but didn't succeed in triggering the issue.
I'll try to take some time this week to try to reproduce the issues I opened lately.
ok thanks