Geyser icon indicating copy to clipboard operation
Geyser copied to clipboard

Direct memory leak

Open masmc05 opened this issue 1 year ago • 13 comments

Describe the bug

Hi, after updating to the latest version of geyser, our proxy started crashing because all 30 gb of allocated direct (non-heap) memory was used, when i looked into the heap dump, i saw that geyser is using tens of millions of direct byte buffs

To Reproduce

Run the server with players for some time

Expected behaviour

The direct memory to not be filled by direct byte buffs

Screenshots / Videos

image image image image

Server Version and Plugins

No response

Geyser Dump

https://dump.geysermc.org/raX3pXIzNSQPBa421wAJD0FDycAJtWve

Geyser Version

2.2.2-SNAPSHOT (git-master-e1e09a6)

Minecraft: Bedrock Edition Device/Version

No response

Additional Context

Previous version was

Loading Geyser version 2.2.2-SNAPSHOT (git-master-19a3dc3)

masmc05 avatar Mar 20 '24 15:03 masmc05

Does it happen consistently on this version? Is there the same behavior using this older version? https://github.com/GeyserMC/Geyser/actions/runs/8299432296

Camotoy avatar Mar 20 '24 15:03 Camotoy

Does it happen consistently on this version?

Since we updated to this version, after the start of the server passed 2 days and this happened, I'll update on monday if it happened again or not (on friday there'll be a restart so can't say sooner)

Is there the same behavior using this older version https://github.com/GeyserMC/Geyser/actions/runs/8299432296

Sadly, I don't think we can just let 600 bedrock online players to join just to test if the server will crash on this specific build, we may reduce the direct memory allocation from 30 gb to something less, but purposely creating a more instable gameplay isn't something good for us

masmc05 avatar Mar 20 '24 15:03 masmc05

The crash is repeating and tried that version + the version before this and the leak was still present

masmc05 avatar Mar 23 '24 20:03 masmc05

We are seeing the same issue too. If you need any data from us, please let us know. From our little testing, we seem to notice it after players disconnect and reconnect.

We are having an issue with the Geyser standalone. My guess is the leak is coming from join logic.

NullOrNaN avatar Mar 24 '24 10:03 NullOrNaN

Just to follow up on @NullOrNaN's comment, just did some testing using Standalone 2.2.2-SNAPSHOT running on 512mb memory, i was increasing by around 26mb every time i joined, and didn't release it on disconnect.

DarkeFae avatar Mar 24 '24 10:03 DarkeFae

There's not much we can do with this information alone unfortunately. We will need a heap dump of this occurring, and preferably at a time when memory usage is abnormally higher than normal. Are you able to send across your heap dump @masmc05? Feel free to reach out to us privately on Discord if you wish to not share it publicly.

Just to follow up on @NullOrNaN's comment, just did some testing using Standalone 2.2.2-SNAPSHOT running on 512mb memory, i was increasing by around 26mb every time i joined, and didn't release it on disconnect.

This is expected behavior in Java programs. Garbage collection is quite a complicated subject and some objects will linger around longer than others, depending on which space in the GC is cleared (especially when you consider old gen, the different regions, etc.). In a nutshell, memory usage should be expected to climb every time new objects are allocated, and eventually once a memory "limit" is hit, the garbage collector runs and the objects will be reclaimed and memory usage will drop, before climbing back up before the cycle repeats again.

If anyone else if having this issue or something similar, please send across a heap dump. For issues such as these, other metrics (i.e. Spark report, Geyser dump, etc.) will not be far too helpful in tracking down the problem.

Redned235 avatar Mar 24 '24 11:03 Redned235

The problem is that Geyser needs to allow it to be collected. Darke was trying to inform you that it wasn't allowing garbage collection. The process can take up and use more than a docker container can allow. This will effectively crash a docker (or live systems) if they run out of RAM.

We effectively have a DoS bug here.

I've taken heaps from a live Geyser standalone and a private Geyser (for testing only) standalone for your reference. Where would you like me to send and upload this? As you're likely aware, the files are large.

NullOrNaN avatar Mar 24 '24 12:03 NullOrNaN

The problem is that Geyser needs to allow it to be collected. Darke was trying to inform you that it wasn't allowing garbage collection. The process can take up and use more than a docker container can allow. This will effectively crash a docker (or live systems) if they run out of RAM.

We effectively have a DoS bug here.

I've taken heaps from a live Geyser standalone and a private Geyser (for testing only) standalone for your reference. Where would you like me to send and upload this? As you're likely aware, the files are large.

Feel free to send me a friend request on Discord and you can DM it to me.

Redned235 avatar Mar 24 '24 14:03 Redned235

I sent a friend requests and dumps. Let me know here if you need anything else from me.

NullOrNaN avatar Mar 24 '24 23:03 NullOrNaN

Yeah I needed to increase from 1GB to 3GB & it still increases so I need to restart the proxy daily due to geyser

bobhenl avatar Mar 29 '24 02:03 bobhenl

This is still present in the latest Geyser. We submitted dumps and were told that was all that was needed. Can we get an update on this as promised, as well as the label corrected?

NullOrNaN avatar Apr 03 '24 06:04 NullOrNaN

In that case can you send a new heap dump? I expect that some of the changes we made as part of the fixes for the vulnerability we posted on 2024-03-30 made the heap dump different.

Tim203 avatar Apr 03 '24 10:04 Tim203

I think that issue was maybe fixed, at least memory is handled better than in previous versions

bobhenl avatar Apr 03 '24 15:04 bobhenl

Seems like it fixed by itself, as after updating it doesn't repeat anymore

masmc05 avatar May 06 '24 17:05 masmc05