micronaut-core Memory leak on Micronaut HTTP server

Expected Behavior

No memory leak.

Actual Behaviour

Heap histograms show a potential memory leak. The following part of the heap histograms are relevant:

   1:      57108507     1370604168  io.micronaut.core.execution.DelayedExecutionFlowImpl$Map
   2:      57108505     1370604120  io.micronaut.core.execution.DelayedExecutionFlowImpl$OnErrorResume
   3:      38072339      913736136  io.micronaut.core.execution.DelayedExecutionFlowImpl$FlatMap
   4:      19036169      456868056  io.micronaut.core.execution.DelayedExecutionFlowImpl$OnComplete

There are more than 50 millions of io.micronaut.core.execution.DelayedExecutionFlowImpl$Map in memory! And this heap histogram is on an application with very few request (so there cannot be 50 millions of file currently uploading).

I think it may be related to this endpoint that bind one part with @Part Publisher<StreamingFileUpload> files then use a raw HttpRequest<?> inputs as we have parts both as files and String attributes.

@ExecuteOn(TaskExecutors.IO)
    @Post(uri = "/{namespace}/{id}", consumes = MediaType.MULTIPART_FORM_DATA)
    public Execution create(
        @Parameter(description = "The inputs") HttpRequest<?> inputs, 
        @Parameter(description = "The inputs of type file") @Nullable @Part Publisher<StreamingFileUpload> files
    ) throws IOException {
        Map<String, Object> inputMap = (Map<String, Object>) inputs.getBody(Map.class).orElse(null);
        // do something with the files ..
    }

The memory leak is new in Micronaut 4, in Micronaut 3 we bind multiple times the body, once in a part as today, and once in an @Body Map<String, Object> inputMap which is no more possible in Micronaut 4.

Reference GitHUb discussion: https://github.com/micronaut-projects/micronaut-core/discussions/10662GitHub

Steps To Reproduce

No response

Environment Information

Operating System: Linux 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 5 21:19:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Lin
Java: OpenJDK Runtime Environment Temurin-17.0.8.1+1 (build 17.0.8.1+1

Example Application

No response

Version

4.3.4

Apr 03 '24 08:04 loicmathieu

I don't know if it is of any help but I notice on an heap dump that it appears that in the DelayedExecutionFlowImpl there is a head attribute which contains a next attribute which contains a next attribute... recursively without apparent ends, looks like all the DelayedExecutionFlowImpl are next of a parent one ...

Apr 04 '24 15:04 loicmathieu

cc @yawkat

Apr 04 '24 15:04 loicmathieu

More information to help diagnose the issue. A single StreamingByetBody is handling 6 millions DelayedExecutionFloImplt$OnErrorResume objects into a RequestLifecycle lambda retaining 1.6GB.

Apr 04 '24 16:04 loicmathieu

Just raw information, our whole application is broken due to this memory leak and customers and users are complaining, we try multiple workaround with no success 😭 We also try to make a PR, but definitely http server part are really complex for new comers. If you have any workaround advice, it will be awesome

Apr 09 '24 07:04 tchiotludo

@tchiotludo please give us some way to reproduce this. The form/multipart code is very complex and I don't see a starting point for debugging here

Apr 09 '24 07:04 yawkat

@yawkat it's very problematic as I didn't succeed in reproducing the problem.

That's why I added as much information as I could; users seem to not using form/multipart that much, and the memory leak points to RequestLifecycle so I'm not sure it is linked to form/multipart at all.

I can ask if I can share the dump if you want, but as a memory dump can contain sensitive data, I need to check first with the user and share it privately.

I can ask our users to provide more information but creating a reproducer seems to be very complex.

Apr 09 '24 08:04 loicmathieu

you could try setting micronaut.server.netty.server-type: full_content

Apr 09 '24 08:04 yawkat

Thanks @yawkat we will test it, meanwhile I'll try my best to make a reproducer

Apr 09 '24 08:04 loicmathieu

Hello. We don't use multipart data at all. Recently I've deployed a new service that answers only health checks, promethus metrics, and rare POSTs with data to store it mongo. It is a very simple micro so I gave 0.5 Gb of RAM to it and I see 1 per day or 2 days OOM there We use MN 4.2.0, netty, NO GraalVM, and Project Reactor everywhere I'll try to investigate a bit deeper later

Apr 09 '24 09:04 katoquro

@yawkat we cannot use micronaut.server.netty.server-type: full_content it crash for all requests with:

2024-04-09 11:33:36,466 WARN  default-nioEventLoopGroup-1-3 io.netty.channel.ChannelInitializer Failed to initialize a channel. Closing: [id: 0x646fd7cb, L:/[0:0:0:0:0:0:0:1]:8080 - R:/[0:0:0:0:0:0:0:1]:48850]
java.lang.IllegalArgumentException: maxContentLength : -2147483648 (expected: >= 0)
	at io.netty.util.internal.ObjectUtil.checkPositiveOrZero(ObjectUtil.java:144)
	at io.netty.handler.codec.MessageAggregator.validateMaxContentLength(MessageAggregator.java:88)
	at io.netty.handler.codec.MessageAggregator.<init>(MessageAggregator.java:77)
	at io.netty.handler.codec.http.HttpObjectAggregator.<init>(HttpObjectAggregator.java:128)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$StreamPipeline.insertMicronautHandlers(HttpPipelineBuilder.java:608)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$StreamPipeline.insertHttp1DownstreamHandlers(HttpPipelineBuilder.java:638)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$ConnectionPipeline.configureForHttp1(HttpPipelineBuilder.java:380)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$ConnectionPipeline.initChannel(HttpPipelineBuilder.java:299)
	at io.micronaut.http.server.netty.NettyHttpServer$Listener.initChannel(NettyHttpServer.java:892)
	at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
	at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
	at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:1130)
	at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
	at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
	at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
	at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
	at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)

Apr 09 '24 09:04 loicmathieu

@katoquro to check if it's the same issue you can try the following command to see if the same objects are accumulating:

jmap -histo:live <pid> | grep io.micronaut.core.execution

Apr 09 '24 09:04 loicmathieu

@loicmathieu only works if you lower your max-request-size to something that fits in memory.

Apr 09 '24 09:04 yawkat

@yawkat on user confirm that using the following configuration fixes the issue (or works around it):

configuration:
  micronaut:
    server:
      max-request-size: 1GB
      netty:
        server-type: full_content

Apr 09 '24 10:04 loicmathieu

@yawkat with this configuration, file of more than 1GB lead to a request that seems to be "blocked forever" without an exception. So it's a workaround for some of our users but not a long term solution.

Do you still need a reproducer (I'm working on it but still didn't make it reproduce the issue)?

Apr 09 '24 10:04 loicmathieu

yes i still need a reproducer, either from you or from @katoquro .

full_content buffers the full request and bypasses most places that use DelayedExecutionFlow. but it's not recommended for permanent use.

Apr 09 '24 10:04 yawkat

@loicmathieu from the first glance, not my case. The micro is run for 5 hours

4245:             1             24  io.micronaut.core.execution.ImperativeExecutionFlowImpl

I will look for a leak in another place 🤔

Apr 09 '24 14:04 katoquro

@katoquro remove the grep and look at the most present objects in the histogram: jmap -histo:live <pid> you took multiple one and check which objects grow in number this could be an easy way to find a leak.

And if it's a different leak, better to open a new issue ;)

Apr 09 '24 14:04 loicmathieu

Can you analyze the memory dump and see what is being leaked? You can try https://eclipse.dev/mat/

Apr 09 '24 14:04 dstepanov

@dstepanov @loicmathieu I think my case is really different. I have next graph where green line is metrics about total committed heap provided by micronenter (sum of jvm_memory_committed_bytes) and yellow is the consumed memory by the java process taken from /proc/<pid>/stat

it's something out of heap, non-heap, etc... 🤔

Apr 09 '24 15:04 katoquro

@loicmathieu any luck on a reproducer?

Apr 18 '24 13:04 graemerocher

@graemerocher unfortunately, no, that's why I added as much information as I could have.

Apr 19 '24 07:04 loicmathieu

@loicmathieu is there a way to run a Kestra locally to reproduce?

Apr 22 '24 12:04 graemerocher

@graemerocher yes, you can either run it from its repository or its docker image.

But what very annoy me is that I cannot reproduce it myself, some users report the issue, I tried to setup Kestra locally with the same configuration and use it with the same scenario but didn't succeed in triggering the issue.

I'll try to take some time this week to try to reproduce the issues I opened lately.

Apr 22 '24 13:04 loicmathieu

ok thanks

Apr 22 '24 13:04 graemerocher

micronaut-core micronaut-core copied to clipboard

Memory leak on Micronaut HTTP server

Expected Behavior

Actual Behaviour

Steps To Reproduce

Environment Information

Example Application

Version

micronaut-core
micronaut-core copied to clipboard