HolographicDisplays icon indicating copy to clipboard operation
HolographicDisplays copied to clipboard

Make BaseHologramManager thread-safe and guard against nulls

Open dasavick opened this issue 1 year ago • 5 comments

This attempts to fix NullPointerException bubbling on ChunkUnloadEvent by introducing null-check in BaseHologramManager#addHologram and making BaseHologramManager#holograms collection synchronized to further harden the API.

The PR is intended to supersede attempted null-checks in each method from #631. Currently, a draft as I'm testing if the issue is fixed and that seems to be a non-trivial task due to the nature of the problem. At the moment I'm not really sure what is the cause.

[03:25:12 ERROR]: Could not pass event ChunkUnloadEvent to HolographicDisplays v3.0.0-SNAPSHOT-b205
java.lang.NullPointerException: Cannot invoke "me.filoghost.holographicdisplays.plugin.hologram.base.BaseHologram.onChunkUnload(org.bukkit.Chunk)" because "hologram" is null
        at me.filoghost.holographicdisplays.plugin.hologram.base.BaseHologramManager.onChunkUnload(BaseHologramManager.java:75) ~[HolographicDisplays.jar:?]
        at me.filoghost.holographicdisplays.plugin.listener.ChunkListener.onChunkUnload(ChunkListener.java:62) ~[HolographicDisplays.jar:?]
        at com.destroystokyo.paper.event.executor.asm.generated.GeneratedEventExecutor45.execute(Unknown Source) ~[?:?]
        at org.bukkit.plugin.EventExecutor.lambda$create$1(EventExecutor.java:75) ~[purpur-api-1.18.2-R0.1-SNAPSHOT.jar:?]
        at co.aikar.timings.TimedEventExecutor.execute(TimedEventExecutor.java:76) ~[purpur-api-1.18.2-R0.1-SNAPSHOT.jar:git-Purpur-1632]
        at org.bukkit.plugin.RegisteredListener.callEvent(RegisteredListener.java:70) ~[purpur-api-1.18.2-R0.1-SNAPSHOT.jar:?]
        at org.bukkit.plugin.SimplePluginManager.callEvent(SimplePluginManager.java:671) ~[purpur-api-1.18.2-R0.1-SNAPSHOT.jar:?]
        at net.minecraft.world.level.chunk.LevelChunk.unloadCallback(LevelChunk.java:887) ~[?:?]
        at net.minecraft.server.level.ChunkHolder.lambda$updateFutures$8(ChunkHolder.java:575) ~[?:?]
        at net.minecraft.server.level.ChunkMap$CallbackExecutor.run(ChunkMap.java:200) ~[?:?]
        at net.minecraft.server.level.ChunkHolder.updateFutures(ChunkHolder.java:585) ~[?:?]
        at net.minecraft.server.level.DistanceManager.runAllUpdates(DistanceManager.java:246) ~[?:?]
        at net.minecraft.server.level.ServerChunkCache.runDistanceManagerUpdates(ServerChunkCache.java:830) ~[?:?]
        at net.minecraft.server.level.ServerChunkCache.tick(ServerChunkCache.java:904) ~[?:?]
        at net.minecraft.server.level.ServerLevel.tick(ServerLevel.java:650) ~[?:?]
        at net.minecraft.server.MinecraftServer.tickChildren(MinecraftServer.java:1621) ~[purpur-1.18.2.jar:git-Purpur-1632]
        at net.minecraft.server.dedicated.DedicatedServer.tickChildren(DedicatedServer.java:522) ~[purpur-1.18.2.jar:git-Purpur-1632]
        at net.minecraft.server.MinecraftServer.tickServer(MinecraftServer.java:1483) ~[purpur-1.18.2.jar:git-Purpur-1632]
        at net.minecraft.server.MinecraftServer.runServer(MinecraftServer.java:1247) ~[purpur-1.18.2.jar:git-Purpur-1632]
        at net.minecraft.server.MinecraftServer.lambda$spin$1(MinecraftServer.java:321) ~[purpur-1.18.2.jar:git-Purpur-1632]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]

dasavick avatar Jul 13 '22 02:07 dasavick

Wild stuff!

> > server.pluginManager.getPlugin("HolographicDisplays").v2HologramManager.holograms
[04:05:23 INFO]: $> [
	Hologram{position=Position{worldName=world, x=189.07263782238957, y=68.3, z=-9.758278752205966}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=1207.6946531926249, y=63.3, z=-1848.9788851634933}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=221.31690475473889, y=75.8, z=11.300000011920929}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=220.9262401297916, y=70.2375, z=23.084974923072437}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=224.5, y=70.3, z=22.5}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=243.50777416170928, y=106.3, z=960.1811414732996}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=-678.8538744560602, y=-12.75949474255298, z=1664.0480630244513}, lines=[TextLine{text=⚔ 1.00}], deleted=true}, 
	null, 
	Hologram{position=Position{worldName=world, x=-4.300000011920929, y=96.3, z=-1685.512499988079}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=1737.487500011921, y=76.3, z=1851.0782310321765}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=216.95655397764685, y=70.2375, z=30.68788966676058}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=222.76982676406578, y=77.78839775014912, z=25.514431758331156}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=-334.4805594241093, y=9.3, z=-1718.7752759523817}, lines=[], deleted=false}, 
	Hologram{position=Position{worldName=world, x=-1000.4945456248589, y=-53.7, z=292.33610733412524}, lines=[ItemLine{itemStack=ItemStack{BARRIER x 1}}, TextLine{text=00:22:20}], deleted=false}, 
	Hologram{position=Position{worldName=world, x=3387.0508828352545, y=76.3, z=557.5561918167947}, lines=[ItemLine{itemStack=ItemStack{BARRIER x 1}}, TextLine{text=00:08:34}], deleted=false}
]

Quick temp fix using okaeri-poly:

// works for v3.0.0-SNAPSHOT (beta 3)
// groovy cannot into direct field access in super
def c=me.filoghost.holographicdisplays.plugin.hologram.base.BaseHologramManager;
def m=server.pluginManager.getPlugin("HolographicDisplays").v2HologramManager;
def f=c.declaredFields[0];
f.setAccessible(true);
f.get(m).remove(null);

dasavick avatar Jul 13 '22 02:07 dasavick

Tested this PR. Works like a charm!

Tom18314 avatar Aug 30 '22 10:08 Tom18314

My instances were running stable ever since, so at least this isn't breaking something. Converting to normal PR then.

dasavick avatar Aug 31 '22 14:08 dasavick

Thanks for this, but I prefer a different approach: making sure that the calling thread is the main thread. The API is not designed to be used asynchronously, other things will be break if used that way.

The null check needs to be there, but I wonder if there is a null because a hologram is created asynchronously and the list "breaks", or something is actually adding a null element.

filoghost avatar Sep 11 '22 15:09 filoghost

This issue is really a goose chase. But I believe that making BaseHologramManager thread-safe will be much better for backwards compatibility. I'm also convinced the main thread check here will break multiple plugins when there is no need to, assuming the cause was actually a result of an asynchronous call.

I'm also a bit confused what would have caused this in my codebase, as I don't think I had any asynchronous calls to the plugin's API, nor I use any 3rd-party plugin that integrate with it. However, I might have missed something (especially since it was 2 months ago) and concurrent access still seems as the most likely cause.

dasavick avatar Sep 15 '22 04:09 dasavick

After further checking, there are only two places that use the addHologram method and they both add a hologram they directly create, which cannot be null. Both the new and the old API have a check that prevents creating holograms asynchronously, so I cannot figure out how it's possible for that to happen (unless you're bypassing the main API to call those methods asyncrhronously).

I think the cause is one of the "delete" methods that is called asynchronously, since those do not have any check.

I've added more checks in https://github.com/filoghost/HolographicDisplays/commit/777b696f3881d1d27dc6ba249bb904aff9b9e3da, please try again to see if we can catch the issue earlier.

filoghost avatar Oct 01 '22 14:10 filoghost

I think I can confirm this one to be the case without testing. I recall occasionally seeing stacktraces related to this on my test server after being AFK awhile and then moving. Turns out I have AFK holograms and this is the only place where holograms API is called asynchronously, it has a couple of nested methods, but one of the callers is async scheduled timer.

It wasn't that direct, so I missed it before, as it basically goes from periodic task into update routine, then update routine has the case where it goes into deletion handler which is called in most of the cases synchronously.

Just updated my code for this case, but I would strongly recommend synchronization as per my PR anyway for the compatibility reasons, and would add a warning (reason + stacktrace) to these calls if you really want to prevent async delete calls in the future.

dasavick avatar Oct 03 '22 01:10 dasavick

Async calls has never been supported, a check was simply missing. Compatibility is needed when changing features, not when fixing errors.

The API might support async calls in the future, but many things would need to be updated to work properly.

filoghost avatar Oct 03 '22 20:10 filoghost