SpongeForge
SpongeForge copied to clipboard
Async Block Updates with Real Time Clock causes watchdog activation with large amounts of water updates
I am currently running
-
SpongeForge version: spongeforge-1.12.2-2838-7.2.4-RC4058
-
Forge version: 14.23.5.2854
-
Java version: 1.8.0_212, IcedTea -Java VM Version: OpenJDK 64-Bit Server VM (mixed mode), IcedTea
-
Operating System: Linux (amd64) version 4.15.0-112-generic
-
Plugins/Mods: Coremods: -EnderCorePlugin (EnderCore-1.12.2-0.5.76-core.jar) - EnderIO Coremod -OpenModsCorePlugin (OpenModsLib-1.12.2-0.12.2.jar) - Openblocks Coremod -CoreMod (Aroma1997Core-1.12.2-2.0.0.2.jar) - Aroma Backup Mods: -Tinkers Construct 1.12.2-2.13.0.183 -Mantle 1.12-1.3.3.55 -EnderCore 1.12.2-0.5.76 -BDLib 1.14.3.12 -EnderIO 5.2.59 -EnderIOIntegrationTIC 5.2.59 -EnderIOBase 5.2.59 -EnderIOConduits 5.2.59 -EnderIOConduitsAppliedEnergistics 5.2.59 -EnderIOConduitsOpenComputers 5.2.59 -EnderIOConduitsRefinedStorage 5.2.59 -EnderIOIntegrationForestry 5.2.59 -EnderIOIntegrationTICLate 5.2.59 -EnderIOMachines 5.2.59 -EnderIOPowerTools 5.2.59 -EnderStorage 2.4.6.137 -Chisel MC1.12.2-1.0.2.45 -Buildcraft 7.99.24.6 -BuildcraftLib 7.99.24.6 -BuildcraftCore 7.99.24.6 -BuildcraftBuilders 7.99.24.6 -BuildcraftTransport 7.99.24.6 -BuildcraftSilicon 7.99.24.6 -BuildcraftCompat 7.99.24.6 -BuildcraftEnergy 7.99.24.6 -BuildcraftFactory 7.99.24.6 -BuildcraftRobotics 7.99.24.6 -AromaBackup 2.1.1.4 -AromaBackupRecovery 2.1.1.4 -Applied Energistics rv6-stable-7 -AE2Stuff 0.7.0.4 -Openblocks 1.8.1 -Just Enough Items 4.15.0.296 -ChickenChunks 2.4.2.74 -CodeChickenLib 3.2.3.358 -CraftingTweaks 8.1.9 -ExtraUtils2 1.0 -FastLeafDecay v14 -ForgeMultipartCBE2.6.2.83 -MinecraftMultipartCBE 2.6.2.83 -MicroblockCBE 2.6.2.83 -IronChest 1.12.2-7.0.67.844 -JourneyMap 1.12.2-5.7.1 -OpenBlocks 1.8.1 -OpenMods 0.12.2 -Pixelmon 8.0.2 -ProjectE 1.12.2-PE1.4.1 -ProjectEX 1.2.0.39 -SpeedyLadders 1.1.3 -Waystones 4.1.0
Issue Description After quarrying out 9-12 chunks of terrain with a single water source flooding the bottom 60+ layers with water, I removed the source block, and despite a small amount of lag, the server was fine, and then suddenly closed, with the watchdog timer reporting that a single tick had taken over 60 seconds. This was despite that for the last minute, everyone had been able to explore, mine, place blocks, use inventories, craft, etc, just fine, and the water was still updating normally.
This should not have triggered the watchdog, as no tick actually took that long (as apparent by everything else working fine).
We (the server members) suspect that the watchdog timer somehow was counting the total time to update the water, rather than the timer spent per tick to update the water.
Crashlogs: https://gist.github.com/fwyrl/63852b30dc47dff7bf892a0e80cd7396 https://gist.github.com/fwyrl/1ea8128433df4fc6b581607d6d8f3bb7
The ServerHangWatchdog is a Vanilla feature, unrelated to Sponge, unaware of whatever the server is doing at that time. It's more likely you've started with a single block update progressing exponentially with more and more block until it was too much for the server to handle. You can crash the base game with such massive water updates, even if Sponge amplify the issue, you should definitely educate your players to not do such lag machines. No water in quarries and no more than 32x32 wide quarries.
ServerHangWatchdog tracks Tick time, as far as I am aware. However the server experienced no significant TPS lag (>80% normal speed). Despite this, the watchdog triggered, saying that it had been a 60s tick.
Unless Sponge multithreads world update, there is no way that any given tick took even 1 second, much less 60. As I said, players were mining, crafting, placing blocks, etc. Machines were working, hoppers, pipes, a nearby quarry, were all functioning fine, and the water surface was updating fine too, right up until the server force-closed.
It's worth noting that it closed exactly 60 seconds after I picked up the source block, which implies that Sponge and/or Async+RealTimeClock is causing cascading fluid changes to be timed as a single tick, despite fluid updates (in vanilla, and at least appears to in Sponge as well) happening over multiple ticks.
This most definitely does NOT crash vanilla servers. I thought this sounded odd, since I've done much bigger sheet generators in the past and had no server crashes, and sure enough; I made a water sheet 3.5x3.5 chunks and over 150 blocks high in a vanilla 12.2 server, removed the single source block, and while the server suffered a lot of tick lag (~1 tick per second), it did not crash, or even come close, despite taking many minutes to complete the full cascade. If I made the surface of the water sheet stupidly big (over 20 chunks to a side), it might crash the server I put up, based on a rough tick time of 1 second for 3.5x3.5 chunks. but it's unreasonable for a 3x3 chunk sheet to crash a server.
I am not the server owner, I'm just filing the report for him because he doesn't feel like doing it.
We have worked around this issue temporarily by simply increasing watchdog time to 1 hour, since removing a single source block should never take that long to finish propagating downstream.
You could have 0.000001 % CPU load in average and still trigger the watchdog because tick took more than 60s. It's a watchdog, it's not a average TPS monitoring.
World update can only be single threaded. Sponge actively fixes mods doing bad multi-thread, not the other way around.
Long story short, you're in denial, can't help you.
I am aware it triggers on a single tick taking 60s. If you've properly read either of my posts, you may have noticed I mentioned in both that this didn't happen. No ticks even hit 1 second. There was no 60 second tick to trigger it, hence the bug report.
I am aware ticks are single-threaded; I was trying to point out how absurd it would be for the water update to take 60s while the rest of the server ran at about 18 tps consistently over that minute, and had no ticks over 1 second. Obviously, the water update isn't multithreaded, and so couldn't take 60 seconds for a tick without the rest of the updates pausing as well. Hence the bug report.
To Be clear, in case you missed it: I removed the source block. TPS slowed down slightly, but not a lot. Then, without warning,while functioning fine, with all machines working, blocks making drops, inventories updating, etc, the server suddenly closed, with the above crash. There was no 1 minute pause before the server killed itself. It went from fine to closed, instantly, and clearly should not have, since no tick could not have taken over 60 seconds.
Crash happened because a single tick took more than 60 s. This is a fact, whether you like it or not, it happened. Denying it won't help the situation and there's no point to discuss further.