godot icon indicating copy to clipboard operation
godot copied to clipboard

Skeleton3D/PhysicalBoneSimulator does not clean up internal state when removing bones, crashes on further updates.

Open mikest opened this issue 10 months ago • 6 comments

Skeleton3D/PhysicalBoneSimulator does not clean up internal state when removing bones

  • Reproduced in 4.3, 4.4b3
  • Windows 11, Godot-4.4beta3

Description

There is a bug which prevents dynamically removing and then adding bones to a skeleton.

The use case for dynamically constructing skeletons is in creating generative creatures, mechanism, and so on. For my example, I use a ball and chain simulation with dynamic number of links. The scene componentry builds a chain link up from a link template and a ball template. This repo is a test case for demonstrating the behavior.

The Chain.gd file creates a ball-and-chain simulation with a dynamic number of links that can be changed directly within the editor. It is the only script file of interest for the project.

Project for reproducing:

https://github.com/mikest/godot-chain/

Steps to reproduce crash

  1. Open project.
  2. Open Scene "Chain.tscn". You should see a monkey head on a chain.
  3. Select the root node.
  4. In the properties panel there is a counter for the number of chain links.
  5. Increase the counter by one.

Results:

  1. Editor segfaults and exits.

Investigation

Skeleton3D, PhysicalBoneSimulator3D, and PhysicalBone3D all maintain a set of internal caching structures that appear to be there as speed optimizations. These caching structures are not properly kept in sync when removing physical bones or when calling Skeleton3D::clear_bones() to remove logical bones. In most cases, this results in numerous errors logged in the console, but on occasion, it can also result in crashes.

This PR is an attempt to reduce those errors and to reduce the conditions which trigger spurious console logging from cache size misses due to stale cache data.

To address this, I included more cache invalidation in Skeleton3D::clear_bones().

I also fixed cache sync problems with the bone_global_pose_dirty and nested_set_offset_to_bone_index in Skeleton3D::add_bone(const String &p_name).

Many of the logging errors that arrise after removing and adding a bone are due to caches being rebuilt before the PhysicalBone3D has been added to the tree. I have moved the cache rebuild to NOTIFICATION_POST_ENTER_TREE. This fixes the errors from _reload_joints.

The last change I made was to the PhysicalBoneSimulator3D::_pose_updated() call, which when detecting a cache size mismatch between the skeleton bone count and the internal bones cache will rebuild it. This happens when a physical bone is still attached and the Skeleton clears the bones and then rebuilds them to a different size. The simulator will make pose updates and cause cache misses.

Related Issues

Fix error spam at start with Skeleton3D modifiers #102030 https://github.com/godotengine/godot/pull/102030

NOTE: I'm not terribly familiar with the godot internals and this is my first PR, so extra attention is probably warranted.

mikest avatar Feb 17 '25 05:02 mikest

Yes, this is the right direction. I can reproduce this. The core cause of the crash seems to be an oversight when implementing #97538. bone_global_pose_dirty and nested_set_offset_to_bone_index are not recalculated in Skeleton3D::_force_update_bone_children_transforms when bones have been added/changed before, resulting in an out of bounds error. I will make a PR to address the crash issue.

detomon avatar Feb 17 '25 13:02 detomon

@detomon I'm new here so I don't know what the norms are for what happens next, but it sounds like you'll part out this PR and work on your own? If so, please tag me when you do, I'd love to see how you've gone about it :)

The scope of this particular PR was "get rid of the crash and all the console spew when creating dynamic resizable skeletons." If I want to see those issues addressed as well will I need to open a new PR with just those?

There are several assumptions being made about the relationship between Skeleton3D, PhysicalBone3D & PhysicalBoneSimulator3D with respect to adding and removing bones that should probably also be addressed at some point.

If there's anything I can do to help in pushing this forward, I would love to do so!

mikest avatar Feb 17 '25 17:02 mikest

My PR is not needed. The fix for the crash is quite simple – you can copy the change from this commit: https://github.com/detomon/godot/commit/5203489d6c10945a35bbb82b149a4d32b25ab7b8.

Then you don't need the changes in Skeleton3D::add_bone (although it fixes the crash, it sets the wrong values). Clearing the values in Skeleton3D::clear_bones is also not strictly necessary, as process_order_dirty = true below ensures, that the arrays are updated when needed. And I don't think skin_bindings.clear() should be done at that point.

I'm not an expert on the other issues with PhysicalBone3D and PhysicalBoneSimulator3D though.

detomon avatar Feb 17 '25 20:02 detomon

The commit #5203489 address the crash. Thank you!

I still see ERROR: scene\3d\physical_bone_simulator_3d.cpp:352 - Index p_bone = 8 is out of bounds (bone_size = 5). after removing and adding bones a few times as p_bone still has stale data in it.

The ERROR: scene\3d\node_3d.cpp:466 - Condition "!is_inside_tree()" is true. Returning: Transform3D() also still persist due to the notification issue being at the wrong point.

When should the skin_bindings be updated?

mikest avatar Feb 18 '25 17:02 mikest

I don't think skin_bindings has anything to do with the errors and should not be updated/cleared. I assume it will break mesh deformations otherwise.

Maybe @TokageItLab can help with these errors.

detomon avatar Feb 20 '25 07:02 detomon

The principle at play here should be: "When the bones are removed from skeleton, objects elsewhere should no longer reference them." This should include the skin bindings.

Hopefully, that's not a controversial design principle.

mikest avatar Feb 21 '25 16:02 mikest

Is this at all related to this issue? Because I'm having issues. 4.4 stable.

E 0:00:41:645 get_global_transform: Condition "!is_inside_tree()" is true. Returning: Transform3D() <C++ Source> scene/3d/node_3d.cpp:466 @ get_global_transform()

(Mark as offtopic if not, Or for other reasons. I just have the time to make my own reproduction project to see if this fix can be resolved for a 4.4.1 build.)

TheRealFame avatar Mar 05 '25 00:03 TheRealFame

Maybe. There's a bunch of skeleton code that will try and access the global_transform before it has been loaded into the tree. It produces those errors. It could also be some other component.

Do you see this when adding skeletons with physics bones?

mikest avatar Mar 10 '25 17:03 mikest

I deleted my source branch, but I didn't realize that this would delete the PR... anyway. I'm going to have to rebuild this PR :(

mikest avatar Mar 10 '25 17:03 mikest

Maybe. There's a bunch of skeleton code that will try and access the global_transform before it has been loaded into the tree. It produces those errors. It could also be some other component.

Do you see this when adding skeletons with physics bones?

Fyi, If you were asking my input~ The error only shows both 3 times in debugger, And in the print log.

TheRealFame avatar Mar 10 '25 18:03 TheRealFame