Hard crash due to colliders tree exceeding maximum traversal depth
Describe the bug?
In MMC world, when users try to join with protoflux unpacked via redprint, they are crashing with what looks to be ProtoFluxWireManager errors.
- This crash was happening prior to today's update, but was not happening yesterday
- This crash is continuing to occur after today's update.
- This crash only occurs if the flux is already unpacked and on redprint when a user tries to join
- Does NOT occur if the protoflux is unpacked after the crashing users (everyone we've tested with so far) have joined
- Does NOT occur if a user attempts to join while nodes are already unpacked via the flux tool (even if redprint is still present in the world).
We attempted to reproduce with the nodes in question in a fresh gridspace and were not able to, we also attempted deleting and recreating the redprint in-world and this did not resolve the issue
To Reproduce
see above
Expected behavior
no crash
Screenshots
No response
Resonite Version Number
Beta 2025.2.17.1301
What Platforms does this occur on?
Linux, Windows
What headset if any do you use?
Multiple headsets and Desktop tested
Log Files
Some of our logs may include mods, tried to make sure they were all clean, however this was reproduced both with and without mods, including after users with mods removed them. Addtionally, some mods are host perspective (no crash) and joiner perspective (crash)
Additional Context
more logs coming soon as we continue testing
Reporters
zangooseoo Modern Scraner emmy7 Joshtiger
DESKTOP-QBGCSAQ - 2025.2.17.1301 - 2025-02-17 14_22_55.log DESKTOP-QBGCSAQ - 2025.2.17.1301 - 2025-02-17 14_33_18.log DESKTOP-QBGCSAQ - 2025.2.17.1301 - 2025-02-17 14_48_21.log
Log where I joined Zangoose while Josh had his flux unpacked on a redprint (I crashed)
DESKTOP-QBGCSAQ - 2025.2.17.1301 - 2025-02-17 14_22_55.log
Log where I joined our world and had Josh unpack his flux on a Redprint and I had scraner join me. (Scraner crashed)
What type of crash is this?
- Does the person crash out of the world, but Resonite keeps running?
- Do you get the Unity Crash handler?
What leads you to believe that it's the ProtoFluxWireManager? I can't find any crash exceptions related to it so far in the logs.
What type of crash is this?
Full client crash, not unity crash. Looking for comment from modern re: nature of exceptions
What's the difference between full client crash and Unity crash?
Unity crash is pretty much a full client crash, so I'm a bit confused.
The wires have a lot of mention in the logs it seems, and then it crashes.
From my testing, I don't get the unity crash handler, the game just closes.
In regards to the ProtoFluxWireManager, there are a lot of mentions of the ProtoFluxWireManager in the log.
I see. There's a lot of exceptions there, but it doesn't mean they're related to the crash. I'll look through it more to see what might be happening.
I was able to reproduce this crash even after moving our existing world system/assets to a new world. Additionaly, crash continued after hotfix. we did not have unmodded logs, sorry
We also observe that grabbing all of the unpacked nodes (when highlighted) and placing them back onto the redprint produces a large lag spike for the one placing it, however now users are seemingly no longer crashing on join, even after packing and unpacking again. We will keep updated
I've pushed out a hotfix 2025.2.18.3, which should get rid of those errors for ProtoFluxWireManager, but I'm now confident that's the source of crash. Can you retest on that one?
for both tests, a fresh world was created and my known-bad-state version of the assets applied
Testing state: host and joining users are patched, user doing the unpacking is not patched Result: joining user crashes
DESKTOP-FURD14O - 2025.2.17.1413 - 2025-02-17 19_07_45.log
SCRANER - 2025.2.18.3 - 2025-02-17 19_24_51.log
Testing state: after unpacking user applied hotfix, meaning all users are now patched Result: joining user crashes
I don't think it actually has anything to do with the ProtoFluxWireManager based on that.
Given the nature of the crash, it could be a stack overflow exception. These tend to behave like this - the client just suddenly dies.
Are you able to make a version of the world that reproduces the crash on join and share it with me?
Are you able to make a version of the world that reproduces the crash on join and share it with me?
Working on that, slowly stripping away assets until I get something bare and reproducible. This is an MMC world
I've successfully made a minimized version of the world that produces the error with our submission's assets stripped for the most part. The host does not appear to crash when joining it, but users attempting to join will crash. You should not need to otherwise interact with the world or the protoflux that's present. How would you like me to share?
https://go.resonite.com/record/G-1QuOmsMXfma/R-48ae0752-fcab-46d9-9ae1-e7203cfd04f3 I think this world url should be correct
Thanks! I've been working on this now. I can reproduce this with headless with debugger joining the session to. What's weird is that it also just straight up crashes, Visual Studio doesn't really catch it and where it happens.
From the error code it does look like stack overrun, which matches how this behaves. But I still need to narrow down where it comes from.
Making progress. I enabled creation of process dumps via registry (https://stackoverflow.com/questions/68215771/can-i-force-windows-create-a-local-dump-of-a-crashing-net-application).
Got the exception:
I narrowed it down further some more. It has to do with a particular BoxCollider (always the same one) in the world. When it gets registered with the Bepu physics system, it kills the app. Something in Bepu uses runaway stack allocations. I'm still figuring out why exactly.
hoping its also the cause of https://github.com/Yellow-Dog-Man/Resonite-Issues/issues/3540
Narrowed it down further. It happens due to excessive depth traversal when the body is being registered - this results in overwriting the program stack in Bepu, because it goes deeper than allocated capacity, corrupting the program as a result.
I'm not 100 % sure what causes this exactly, it might be that there's just way too many colliders in a single spot.
It's this specifically. It's still an issue in latest version of Bepu in master, although it's an edge case: https://github.com/bepu/bepuphysics2/blob/ec5082f1839a50b8aff07354483994d6b4bad73b/BepuPhysics/Trees/Tree_RayCast.cs#L109
It has to do with a particular BoxCollider (always the same one) in the world
Interesting, based on testing it had everything to do with the redprint and flux that's in that world, with the problem actually going away when the flux was removed from the redprint and placed back on it. is the box collider in there, or is that just coincidence? or, put differently, are there changes we can actively make to the world itself that'd help avoid this edge case?
This specific crash has been fixed in 2025.2.19.21! This is currently a workaround, I just increased the stack size for the traversal. It's possible for this to still occur with other scenarios, but it should be harder to trigger.
I've made an issue on Bepu, asking if they can implement more robust handling: https://github.com/bepu/bepuphysics2/issues/354
I'll probably keep the issue opened until then for now.
@gentlecolts It's probably the colliders in that. From what I could tell in the code, it probably bunched up tons of colliders in very tiny area, which is what made the tree on Bepu side way too deep.
#5288 is also an instance of it. seems photon dust is extremely finickey with collisions rn due to bepu so maybe we need to push the upgrade of bepu as a high priority thing? or maybe its something with photon dust collisions just being extra finickey
okay so after messing with more stuff in game, somehow i managed to make it happen spacial variables
So at this point i cant get the crash to happen consistantly with spacial vars either but it does happen pretty often when inside my fancy gravity defying ship. About 2-10 times a day
rn im exploring a few meshes i have rn that are causing the crash! (convex specificly)
due to it not providing a log (as per every system access violation crash) im not 100% sure if it is the infinite tree bug, so im gonna open a new issue and let whoever can look at the coredump figure that out. but i do have a replication item this time!