Lootr icon indicating copy to clipboard operation
Lootr copied to clipboard

1.18.2 Lootr TPS drop

Open XionioXMaster opened this issue 3 years ago • 2 comments

https://spark.lucko.me/252KImLJ8k

noobanidus.mods.lootr.block.entities.TileTicker.serverTick() 21.55% Last 1.18.2 version from curseforge

XionioXMaster avatar Oct 04 '22 14:10 XionioXMaster

Could you give me some more context about this? I realise that this is quite a higher number, but the overall TPS of the server seems to be stable at 20. Is this during a period of significant lag? How long was the sample? Is this happening constantly, or was the sample taken during world generation?

I do have some ideas for improving the performance of this piece of code, but unfortunately, due to its nature, it's always going to be slightly more expensive; the alternative is accidentally loading chunks on the main thread while they're still being generated, which would produce tremendously more lag.

noobanidus avatar Oct 04 '22 23:10 noobanidus

@noobanidus This is due to https://github.com/noobanidus/Lootr/blob/e65147f1c8fab9bb96173a48c535a968982325ac/src/main/java/noobanidus/mods/lootr/event/HandleChunk.java#L21-L32

You are caching every loaded chunk and never unloading it. After a while, the map gets so huge, that every lookup kills the server's performance. Can confirm that on my 1.19.2 server at the moment, Lootr is taking up 80% of the tick time, dropping TPS to 15. You should be able to subscribe to something like onChunkUnload and clean up the chunks that you have loaded once they are no longer needed.

Same issue is present on Forge and Fabric btw.

Screenshot 2023-01-09 at 20 00 07

ViRb3 avatar Jan 09 '23 19:01 ViRb3

I realize the chunk position map is problematic (for other reasons), but please note that this is the chunk position that's being cached, which is a combination of the X and Z coordinates of the chunk, not the actual chunk itself being stored.

Likewise, no chunks are actually being loaded at this point; the LOADED_CHUNKS map is being populated with the coordinates once the chunk has been fully loaded -- there's never any actual chunk loading. Without having this system, the conversion of chests would cause chunk-loading on the main thread which would block and causing significant lag. That said, the data structure could be due for some redesign, as well as the method of accessing it.

However, some further context is needed: is this happening all of the time, for every tick of the server without exploration, or is it only happening during exploration? A link to the full report would be good.

If the former, then it appears as though there's something stuck in the queue for conversion that's causing it to constantly refer back to the list, but failing to convert. Does this issue persist through server restart?

noobanidus avatar Jan 10 '23 01:01 noobanidus

Apologies, I didn't have the time to look carefully and only glimpsed at the code. I was aware that you're not loading the chunks, but I thought you were keeping a reference to them indefinitely, which prevented them from getting GC'd. Given you're only storing the position, that should not be the case, unless the position holds a back-reference to the chunk. Not sure what the structure is there.

This issue happens for every tick, during no exploration at all. A restart helps, bringing the tick usage down from 70% to 30%, but even just after a fresh start, it is still way above everything else. Possible causes could be Compact Machines mod (they create chunk-loaded mini dimensions), and FTB Chunks mod (force loaded some overworld chunks). Maybe a chunk is getting repeatedly loaded/unloaded for some reason? Modpack is ATM8 1.0.7.

You can find the low TPS profile here: https://transfer.sh/InfkwA/ifPYySdjU3.sparkprofile Here's profile right after server restart: https://transfer.sh/paQ6wx/zDrtBKsnh3.sparkprofile Load at: https://spark.lucko.me/

ViRb3 avatar Jan 10 '23 01:01 ViRb3

It seems most likely to me that the TileTicker.tileEntries is continuously growing over time, as HashSet.<init> is appearing with a substantial percentage in the profile, and that should only be slow if the constructor is copying a large number of elements from a provided collection/set. This seems to match @noobanidus's theory about something being stuck in the queue.

Resetting the server probably helps because it empties the set.

embeddedt avatar Jan 10 '23 02:01 embeddedt

I think there are multiple issues here: the large number of stored chunk positions (as far as I'm aware it's just a basic Pair class with the X and Z and some functions, so no actual reference) combined with repeated checks implying that there's a "stuck" tile entry.

Reasoning for the current system and further on what I said before: since 1.17, the standard method of just checking if the chunk that the block entity was in being properly loaded ceased to be sufficient: setBlock to convert the chest to a Lootr chest force-loads 25 chunks (5x5) around the relevant chunk.

Without the current code, it would see that the chest's chunk was loaded, attempt to replace it in the main thread and then proceed to block the main thread as it waited on 25 other chunks to complete generation and loading.

It's definitely an intermittent issue, though, as I've only had a few reports of issues with it, and generally those issues are concurrent with the actual world generation.

However, as you're experiencing it constantly, without world generation or exploration, that means an entry (or multiple entries) are failing to unload.

Steps I'd like to take:

  • A linked hash set would most likely reduce some of the overhead as repeated entries would be accessed quicker.
  • Re-doing "containsAll" chunk to check each required chunk and then store those not contained. This would reduce the number of look-ups and also allow the identification of specific, problematic chunks that are never being "loaded".
  • Re-adding decay to tile entries: if they fail to resolve after X period of time (configurable, perhaps), they should just be removed from the list. Downsides: this could potentially result in some chests not getting converted. Upsides: we can specifically log which entries are being unloaded.
  • Adding decay for the chunk storage. After a certain period of time being in the list, chunks would be removed, resulting in a smaller set of chunks to consider. In theory this should never cause any problems.

I'll need to make some more considerations over this.

At the very least, a debug build could be made that would at the very least identify which block entries are getting "stuck" so there's some way to investigate what's happening...

I'm afraid I've got quite a bit on my plate at the minute, but I will try to work on something later this evening my time. If you're able to join my Discord, I can use that to get you a server-side replacement version for testing, otherwise I can try uploading a version here.

Further to that, which exact version of Lootr is this for 1.19 so I can ensure that I don't cause a version conflict with the debug build?

noobanidus avatar Jan 10 '23 03:01 noobanidus

Dumping some of our findings from Discord to track better here.

I used a debug build provided by @noobanidus that logs entries which are unable to be converted after 2 minutes, as well as remove them from the queue:

:warning: DO NOT use this build in your production servers

From this build, I got the following —

  1. Spammed every tick:

[12:52:51] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / compactmachines:compact_world] at BlockPos{x=8, y=41, z=-1016}

  1. Appeared once:

[12:52:42] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / minecraft:overworld] at BlockPos{x=-194, y=66, z=2096} [12:52:42] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / minecraft:overworld] at BlockPos{x=-193, y=66, z=2096} [12:52:42] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / minecraft:overworld] at BlockPos{x=-195, y=66, z=2096} [12:52:42] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / minecraft:overworld] at BlockPos{x=66, y=67, z=-300} [12:52:43] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / minecraft:overworld] at BlockPos{x=167, y=80, z=-285} [12:52:43] [Server thread/ERROR] [no.mo.lo.ap.LootrAPI/]: Removed old (2 minutes or older) entry for ResourceKey[minecraft:dimension / minecraft:overworld] at BlockPos{x=167, y=80, z=-284}

The offending entry that keeps appearing again and again despite being removed looks like this:

image

It is unknown what causes the chest to keep appearing every tick. For now, you can work around this issue by blacklisting the "compactmachines:compact_world" dimension in Lootr's config. Check the referenced issue above my reply for an example.

ViRb3 avatar Jan 18 '23 19:01 ViRb3

@ViRb3 Were you able to use the custom build I provided via Discord and did that help at all? The issue appeared to be all chests being added at load time rather than being filtered to chests with loot tables -- although that might potentially cause an issue where some tables are assigned their loot table later in the process.

noobanidus avatar Feb 19 '23 03:02 noobanidus

I'm afraid that I had to bring down my server and never had the time to investigate further. The blacklist was a functional workaround, though.

ViRb3 avatar Feb 19 '23 20:02 ViRb3

Hey! We're having a similar issue with lootr taking up a lot of perf. Here's a spark report: https://spark.lucko.me/tTerVBkGFy Specifically net.zestyblaze.lootr.blocks.entities.TileTicker.serverTick() 40.52% I can help test things if needed

booleancoercion avatar Feb 20 '23 18:02 booleancoercion

@booleancoercion I think this is going to be basically identical to Virb3's issue; hopefully the same fixes can do something to help. If you're on Discord, I can try to get you a build. What version of Minecraft, Fabric and Lootr are you running?

noobanidus avatar Feb 21 '23 04:02 noobanidus

Minecraft 1.18.2 Fabric 0.14.10 Lootr 1.18.2-0.3.23.60

We're running a slightly modified version of AOF5 basically, I believe you can find the mod list in the spark report. I'll join the discord server and we can go from there.

booleancoercion avatar Feb 21 '23 19:02 booleancoercion

here to say it occurs on 1.19.2 forge too https://spark.lucko.me/F3O3FBjMch

Wallano-pand avatar Feb 22 '23 16:02 Wallano-pand

I believe this should be addressed with the most recently released versions of Lootr. If people are still having problems, please file a new issue.

noobanidus avatar Mar 30 '23 01:03 noobanidus