lithium-fabric icon indicating copy to clipboard operation
lithium-fabric copied to clipboard

Heavy redstone/mechanisms can lead to broken chunk loading within the same dimension

Open LucilleTea opened this issue 4 years ago • 16 comments

Originally reported by Andrews54757 on the Discord.

When the server is under some heavy loads (Andrews54757 reports many will work, I've only reproduced it while running this ice farm), chunk loading within the same dimension will stop working properly. Only chunks very close to players will load, and nearby chunks will also unload when they shouldn't. A player travelling to this dimension through a portal is likely to overload the server, kicking everyone but not crashing it.

Here's a video of the chunk loading issue in 20w14a.

I've reproduced this issue both in 1.15.2 and 20w14a. In 1.15.2 running lithium 0.4.6 and optionally phosphor 0.5.2 In 20w14a running lithium 0.5.0-alpha1 and phosphor 0.5.2

In 1.15.2 I've found that using phosphor, and disabling both use_fast_shape_comparison and extend_block_shape_cache, will fix the issue - chunks will load properly and players won't be kicked. Andrews54757 reports that this wasn't enough to fix the problem.

LucilleTea avatar May 01 '20 05:05 LucilleTea

I am unable to reproduce this issue regardless of the configuration used. I'm suspicious this might be caused by an overload condition and is not necessarily the fault of Lithium. The issues with chunks loading/unloading incorrectly looks to be client-side (noting that they can reappear and disappear in single frame).

jellysquid3 avatar May 09 '20 00:05 jellysquid3

I'm leaving this issue as confirmed since multiple players have been able to reproduce the issue here, but since resolutions vary from person to person it's not clear what's the actual underlying bug in Lithium, if any.

jellysquid3 avatar May 09 '20 00:05 jellysquid3

I'm not convinced this issue is directly caused by Lithium. I've spent the last few days trying to narrow down what's going on, and it seems to be that the light engine is becoming so overburdened as to completely stall any tasks waiting for it to finish (chunk loading/saving, server shutdown, etc.)

In other words, Lithium provides such a significant speed up to the game server that it becomes impossible for the light engine to keep up, and as it has an unbounded queue it will eventually grow to ridiculous size and force a stall when anything queries the state of the light engine.

The initial user reports we had indicating that this was caused by the shape patches appears to only be the case because they had such a large impact on server tick times. With the new optimizations landing in the 1.16.x branch, it seems disabling them no longer pushes tick rates back up high enough as to prevent the light engine from becoming overburdened.

I'm going to be adding diagnostic tracers into Lithium so users can be signaled when this overload condition occurs, but it's not clear what the best solution is for resolving it. The first thing that comes to mind is that we should block the game thread whenever the light update queue exceeds a certain size, but that could lead to rare and excessively long ticks.

jellysquid3 avatar May 10 '20 00:05 jellysquid3

Marking as critical as this issue is appearing more with recent improvements and can result in the server being abruptly killed by the watchdog, possibly creating invalid state on disk.

jellysquid3 avatar May 10 '20 00:05 jellysquid3

Bravo!!! Amazing work finding out the culprit of the problem! I am very impressed!

Andrews54757 avatar May 10 '20 03:05 Andrews54757

The first thing that comes to mind is that we should block the game thread whenever the light update queue exceeds a certain size, but that could lead to rare and excessively long ticks.

This seems like it'll be necessary as a fail safe no matter what. Perhaps allow the upper bound to be adjusted in the config. The threaded lighting implementation has a lot of problems resulting in a lot of glitchiness, with this issue mostly covered by MC-164281. I suspect that in a few years when Mojang gets around to it, they'll probably "fix" it in the same way.

mrgrim avatar Jun 25 '20 20:06 mrgrim

Before blocking the game thread completely, we could throttle it a bit before we have to block it. That might prevent huge sudden lag spikes

2No2Name avatar Jul 20 '20 21:07 2No2Name

what kind of help is wanted from this?

ImUrX avatar Jul 25 '20 20:07 ImUrX

A world that doesn't have much clutter but easily reproduces the crash would be helpful. At this point we just have to implement a workaround for the light queue becoming too long and test that using some test worlds.

2No2Name avatar Jul 30 '20 09:07 2No2Name

Dispensers are very slow for example a 64 item dropper only drops 32

zLauch avatar Aug 30 '20 17:08 zLauch

Dispensers are very slow for example a 64 item dropper only drops 32

Is that related to this issue? I'm following along the convo, but I think I'm misunderstanding something because I don't see how the light engine stalling is related to some dropper issue. Some more details would be super helpful, @zLauch :)

felix91gr avatar Sep 02 '20 06:09 felix91gr

Our server has been having this lockup as well, it's caused quite a few lighting glitches but hasn't appeared to do any severe damage otherwise. At the request of the AOF3 pack dev I had the server owner install the /suites/1732359139/artifacts/32814842 build. I have quite a lot of documentation available here: https://github.com/TeamAOF/All-of-Fabric-3/issues/232 though most of the useful stuff is in the logs.

We're willing to use experimental versions of the mod, however we haven't really found a reliable way to reproduce it. Logging into unloaded chunks but generated chunks seemed to cause it the most often. We're going to keep using the earlier mentioned build.

If you wanna get in contact with me I'm in the discord by the username "LadyPotaty#3238"

MissPotato avatar Jan 03 '21 03:01 MissPotato

This issue is an issue that appears in vanilla. The current information I have is: As vanilla does not wait for the light engine to finish working on its update queue besides when chunks loading chunks, it is possible that the server will schedule light updates all the time, and the light engine never manages to empty its update queue. This causes the chunk loading to wait for the light updates to finish forever, which causes you to notice that chunk loading doesn't work. Lithium optimizes the server but not the light engine, so it might happen more often that a mechanism that schedules lots of light updates is causing the light engine to lag behind like this. Some users stop experiencing this problem by also speeding up the light engine, e.g. by installing Phosphor

2No2Name avatar Jan 03 '21 16:01 2No2Name

It appears that the snapshot version didn't fix it, though it does seem to happen less often. I don't if it's happening less often due the unpredictable nature of it. It seems to be happening most often when users join the world and during rocketed elyra flight. It's seems that it's also unable to save lighting level to disk for some chunks, as every time we restart the server the same two chunks go pitch black, even after replacing torches. We do have phosphor as well.

Have you looked at what https://github.com/PaperMC/Paper does to handle this? I would imagine this is a bug they've encountered.

The owner of the server has stated that they're fine using experimental versions if you think playtesting could help expedite finding a solution.

MissPotato avatar Jan 04 '21 05:01 MissPotato

So is this mod so good its breaking the game because its TOO good/fast? Thats impressive!

timelady-victorious avatar Mar 01 '21 09:03 timelady-victorious

Starlight may alleviate this issue. It's nearly 35x faster than the Vanilla lighting engine while generating chunks, and around 32x faster than Phosphor*.

CC @MissPotato.

* Unless I did my math wrong, that is what this graph seems to suggest.

Ocawesome101 avatar Dec 02 '21 04:12 Ocawesome101