Valkyrien-Skies-2 icon indicating copy to clipboard operation
Valkyrien-Skies-2 copied to clipboard

dismount_dead_entities/MixinLivingEntity causing Server Thread lockup, degraded chunk load/unload

Open codeHusky opened this issue 10 months ago • 10 comments

This issue occurs when only Valkyrien Skies and addons are installed and no other mods

  • [X] I have tested this issue and it occurs when no other mods are installed

Minecraft Version

1.20.x

Mod Loader

Forge

Issue description

During chunk load/unload, mobs with passengers while unloading can force the server thread to freeze, force loading the chunks that are unloading to perform a dismount check. This can cause multi-second freezes and, in the worst case, a server crash.

This also severely degrades the performance of server chunk loading/unloading, significantly delaying chunk generation and overall straining the server's MSPT.

This can also cause the server to hang at shutdown indefinitely as the getChunk calls never succeed.

Issue reproduction

Load/unload chunks in a world with mobs with passengers and watch profiling tools to see the server hang. I plan to release my HealthMonitor mod soon to make diagnosing these issues easier, but all it does is do some basic automated profiling of the Server Thread.

Logs

https://gist.github.com/codeHusky/844a4a6044f96c7bc8e475ad8273b328

Shutdown Deadlock https://gist.github.com/codeHusky/196fe35e5903779a8882f4be19df3594

codeHusky avatar Apr 08 '24 20:04 codeHusky

Same with the Valkyrien Skies 2 (2.1.2-beta.1) (& Clockwork 0.1.13, but still the same without the addon), but for 1.19.2, both Forge and Fabric.

If someone will find out this issue, there are also massive issues with dimension mods. When VS2 is installed, there are several potential mod issues can arise. The major one is with the any Nether-related mod. I can reproduce the same issue with VS2 + BetterNether/Cinderscapes/any other Nether mod and even with nearly any dimension mod I could test out from Modrinth/CF.

When I'm teleporting to the Nether through command for the first time or through the Nether portal -> TPS seems ok, no server degraded performance, everything works as expected. When leaving any dimension except the Overworld -> TPS drops to 0.6, server gets overloaded and deadlocks itself through dismount_dead_entities/MixinLivingEntity$preDismountVehicle() function.

Spark shows this stacktrace. (0.6 TPS, 1375+ MSPT, VS2 callback stuck) image

Seems that when a mixin tries to load/unload the chunk, it never unloads and makes the server stutter a lot. I'm not currently sure, but probably something wrong is there: https://github.com/ValkyrienSkies/Valkyrien-Skies-2/blob/97bf648df1c3200dc5dcd4e94086d516113b736c/common/src/main/java/org/valkyrienskies/mod/mixin/feature/dismount_dead_entities/MixinLivingEntity.java#L39

The more degraded performance is detected when the Clockwork addon is installed.

Having the same issue with my modpack with 425 mods, but no success of binary search was done until VS2 was removed entirely from the pack with addons. No issues was there when VS2 removed. Also to note, that Singleplayer works just as expected, the problem is only found on dedicated servers.

DenisMasterHerobrine avatar Apr 12 '24 07:04 DenisMasterHerobrine

Hi, I have an issue very similar to this in 1.19.2 and Vs clockwork. Are you sure there are no ways of resolving this on a server? These mods are like the core of the gameplay and these crashes are problematic.

TotoUmbertoch72 avatar Apr 16 '24 11:04 TotoUmbertoch72

Hello ! I noticed the exact same problem and it is also a core mode for my modpack https://spark.lucko.me/pdhyk3nfxu

DaftHunk avatar Apr 17 '24 17:04 DaftHunk

Removing the mixin solved my issue, now the server is working as intended for a week. At the moment I haven't found any proper reimplementation of the mixin to not make it stuck, but seems to work just fine with dead entities even without this mixin at all.

DenisMasterHerobrine avatar Apr 17 '24 18:04 DenisMasterHerobrine

Same issue here, very similar spark profile: https://spark.lucko.me/ur6cHkPR7N

Currently compiling a version with that mixin temporarily removed, will report back if that fixes the lag issue

Jamalam360 avatar Jul 14 '24 14:07 Jamalam360

The version crashed, presumably because the main branch has changes that the latest released Eureka version isn't currently updated to. As such I am unable to test

Jamalam360 avatar Jul 14 '24 15:07 Jamalam360

As a hack, you can "disable" the mixin by opening the Valkyrien Skies jarfile with any archive tool (e.g. 7-zip), opening valkyrienskies-common.mixins.json, removing the "feature.dismount_dead_entities.MixinLivingEntity", line, then finally replace the file in the jarfile.

MichailiK avatar Jul 14 '24 15:07 MichailiK

That's what I'm doing right now. Hopefully it works

Jamalam360 avatar Jul 14 '24 15:07 Jamalam360

As far as I can tell, my issue is fixed with that mixin disabled (of course, disabling the mixin probably causes other issues). While flying lots around the nether with the mixin enabled, the server chugs down to a literal 1 TPS. With the mixin disabled, it holds 20 tps.

Jamalam360 avatar Jul 14 '24 16:07 Jamalam360

Likewise, can confirm. Looking at the stacktrace from my custom watchdog, the issue lies here: https://github.com/ValkyrienSkies/Valkyrien-Skies-2/blob/97bf648df1c3200dc5dcd4e94086d516113b736c/common/src/main/java/org/valkyrienskies/mod/mixin/feature/dismount_dead_entities/MixinLivingEntity.java#L35

The entities' setRemoved methods (which then call dismount) are called AFTER the chunk has been unloaded, so getBlockState has to load the chunk again to find the blockstate.

This happens for every single entity that is ever loaded, locking up the server slowly over time.

ByThePowerOfScience avatar Jul 18 '24 01:07 ByThePowerOfScience