Resonite-Issues icon indicating copy to clipboard operation
Resonite-Issues copied to clipboard

Hard crash due to colliders tree exceeding maximum traversal depth

Open gentlecolts opened this issue 10 months ago • 27 comments

Describe the bug?

In MMC world, when users try to join with protoflux unpacked via redprint, they are crashing with what looks to be ProtoFluxWireManager errors.

  • This crash was happening prior to today's update, but was not happening yesterday
  • This crash is continuing to occur after today's update.
  • This crash only occurs if the flux is already unpacked and on redprint when a user tries to join
  • Does NOT occur if the protoflux is unpacked after the crashing users (everyone we've tested with so far) have joined
  • Does NOT occur if a user attempts to join while nodes are already unpacked via the flux tool (even if redprint is still present in the world).

We attempted to reproduce with the nodes in question in a fresh gridspace and were not able to, we also attempted deleting and recreating the redprint in-world and this did not resolve the issue

To Reproduce

see above

Expected behavior

no crash

Screenshots

No response

Resonite Version Number

Beta 2025.2.17.1301

What Platforms does this occur on?

Linux, Windows

What headset if any do you use?

Multiple headsets and Desktop tested

Log Files

Some of our logs may include mods, tried to make sure they were all clean, however this was reproduced both with and without mods, including after users with mods removed them. Addtionally, some mods are host perspective (no crash) and joiner perspective (crash)

Additional Context

more logs coming soon as we continue testing

Reporters

zangooseoo Modern Scraner emmy7 Joshtiger

gentlecolts avatar Feb 17 '25 22:02 gentlecolts

Log where I joined Zangoose while Josh had his flux unpacked on a redprint (I crashed)

DESKTOP-QBGCSAQ - 2025.2.17.1301 - 2025-02-17 14_22_55.log

Log where I joined our world and had Josh unpack his flux on a Redprint and I had scraner join me. (Scraner crashed)

DESKTOP-QBGCSAQ - 2025.2.17.1301 - 2025-02-17 14_48_21.log

ModernBalloonie avatar Feb 17 '25 23:02 ModernBalloonie

What type of crash is this?

  • Does the person crash out of the world, but Resonite keeps running?
  • Do you get the Unity Crash handler?

What leads you to believe that it's the ProtoFluxWireManager? I can't find any crash exceptions related to it so far in the logs.

Frooxius avatar Feb 17 '25 23:02 Frooxius

What type of crash is this?

Full client crash, not unity crash. Looking for comment from modern re: nature of exceptions

gentlecolts avatar Feb 17 '25 23:02 gentlecolts

What's the difference between full client crash and Unity crash?

Unity crash is pretty much a full client crash, so I'm a bit confused.

Frooxius avatar Feb 17 '25 23:02 Frooxius

The wires have a lot of mention in the logs it seems, and then it crashes.

From my testing, I don't get the unity crash handler, the game just closes.

In regards to the ProtoFluxWireManager, there are a lot of mentions of the ProtoFluxWireManager in the log.

ModernBalloonie avatar Feb 17 '25 23:02 ModernBalloonie

I see. There's a lot of exceptions there, but it doesn't mean they're related to the crash. I'll look through it more to see what might be happening.

Frooxius avatar Feb 17 '25 23:02 Frooxius

I was able to reproduce this crash even after moving our existing world system/assets to a new world. Additionaly, crash continued after hotfix. we did not have unmodded logs, sorry

We also observe that grabbing all of the unpacked nodes (when highlighted) and placing them back onto the redprint produces a large lag spike for the one placing it, however now users are seemingly no longer crashing on join, even after packing and unpacking again. We will keep updated

gentlecolts avatar Feb 18 '25 00:02 gentlecolts

I've pushed out a hotfix 2025.2.18.3, which should get rid of those errors for ProtoFluxWireManager, but I'm now confident that's the source of crash. Can you retest on that one?

Frooxius avatar Feb 18 '25 00:02 Frooxius

for both tests, a fresh world was created and my known-bad-state version of the assets applied

Testing state: host and joining users are patched, user doing the unpacking is not patched Result: joining user crashes

DESKTOP-FURD14O - 2025.2.17.1413 - 2025-02-17 19_07_45.log

SCRANER - 2025.2.18.3 - 2025-02-17 19_24_51.log

Testing state: after unpacking user applied hotfix, meaning all users are now patched Result: joining user crashes

SCRANER - 2025.2.18.3 - 2025-02-17 19_34_44.log

gentlecolts avatar Feb 18 '25 00:02 gentlecolts

I don't think it actually has anything to do with the ProtoFluxWireManager based on that.

Given the nature of the crash, it could be a stack overflow exception. These tend to behave like this - the client just suddenly dies.

Are you able to make a version of the world that reproduces the crash on join and share it with me?

Frooxius avatar Feb 18 '25 00:02 Frooxius

Are you able to make a version of the world that reproduces the crash on join and share it with me?

Working on that, slowly stripping away assets until I get something bare and reproducible. This is an MMC world

gentlecolts avatar Feb 18 '25 00:02 gentlecolts

I've successfully made a minimized version of the world that produces the error with our submission's assets stripped for the most part. The host does not appear to crash when joining it, but users attempting to join will crash. You should not need to otherwise interact with the world or the protoflux that's present. How would you like me to share?

gentlecolts avatar Feb 18 '25 01:02 gentlecolts

https://go.resonite.com/record/G-1QuOmsMXfma/R-48ae0752-fcab-46d9-9ae1-e7203cfd04f3 I think this world url should be correct

gentlecolts avatar Feb 18 '25 01:02 gentlecolts

Thanks! I've been working on this now. I can reproduce this with headless with debugger joining the session to. What's weird is that it also just straight up crashes, Visual Studio doesn't really catch it and where it happens.

From the error code it does look like stack overrun, which matches how this behaves. But I still need to narrow down where it comes from.

Frooxius avatar Feb 18 '25 20:02 Frooxius

Making progress. I enabled creation of process dumps via registry (https://stackoverflow.com/questions/68215771/can-i-force-windows-create-a-local-dump-of-a-crashing-net-application).

Got the exception: Image

Frooxius avatar Feb 18 '25 21:02 Frooxius

I narrowed it down further some more. It has to do with a particular BoxCollider (always the same one) in the world. When it gets registered with the Bepu physics system, it kills the app. Something in Bepu uses runaway stack allocations. I'm still figuring out why exactly.

Frooxius avatar Feb 18 '25 22:02 Frooxius

hoping its also the cause of https://github.com/Yellow-Dog-Man/Resonite-Issues/issues/3540

Redd56 avatar Feb 18 '25 23:02 Redd56

Narrowed it down further. It happens due to excessive depth traversal when the body is being registered - this results in overwriting the program stack in Bepu, because it goes deeper than allocated capacity, corrupting the program as a result.

I'm not 100 % sure what causes this exactly, it might be that there's just way too many colliders in a single spot.

Frooxius avatar Feb 18 '25 23:02 Frooxius

It's this specifically. It's still an issue in latest version of Bepu in master, although it's an edge case: https://github.com/bepu/bepuphysics2/blob/ec5082f1839a50b8aff07354483994d6b4bad73b/BepuPhysics/Trees/Tree_RayCast.cs#L109

Frooxius avatar Feb 18 '25 23:02 Frooxius

It has to do with a particular BoxCollider (always the same one) in the world

Interesting, based on testing it had everything to do with the redprint and flux that's in that world, with the problem actually going away when the flux was removed from the redprint and placed back on it. is the box collider in there, or is that just coincidence? or, put differently, are there changes we can actively make to the world itself that'd help avoid this edge case?

gentlecolts avatar Feb 19 '25 00:02 gentlecolts

This specific crash has been fixed in 2025.2.19.21! This is currently a workaround, I just increased the stack size for the traversal. It's possible for this to still occur with other scenarios, but it should be harder to trigger.

I've made an issue on Bepu, asking if they can implement more robust handling: https://github.com/bepu/bepuphysics2/issues/354

I'll probably keep the issue opened until then for now.


@gentlecolts It's probably the colliders in that. From what I could tell in the code, it probably bunched up tons of colliders in very tiny area, which is what made the tree on Bepu side way too deep.

Frooxius avatar Feb 19 '25 00:02 Frooxius

#5288 is also an instance of it. seems photon dust is extremely finickey with collisions rn due to bepu so maybe we need to push the upgrade of bepu as a high priority thing? or maybe its something with photon dust collisions just being extra finickey

Redd56 avatar Sep 06 '25 21:09 Redd56

okay so after messing with more stuff in game, somehow i managed to make it happen spacial variables

Redd56 avatar Oct 19 '25 06:10 Redd56

So at this point i cant get the crash to happen consistantly with spacial vars either but it does happen pretty often when inside my fancy gravity defying ship. About 2-10 times a day

Redd56 avatar Nov 10 '25 18:11 Redd56

rn im exploring a few meshes i have rn that are causing the crash! (convex specificly)

Redd56 avatar Nov 28 '25 23:11 Redd56

due to it not providing a log (as per every system access violation crash) im not 100% sure if it is the infinite tree bug, so im gonna open a new issue and let whoever can look at the coredump figure that out. but i do have a replication item this time!

Redd56 avatar Nov 29 '25 00:11 Redd56