godot icon indicating copy to clipboard operation
godot copied to clipboard

Improve global bone pose calculation in `Skeleton3D`

Open detomon opened this issue 1 year ago • 6 comments
trafficstars

This change introduces a way to calculate global bone poses without recalculating all bones every time a pose changes. It defines the bone hierarchy as a nested set. This makes it possible to represent bone subtrees continuously within an array. With regard to the introduction of SkeletonModifier3D, this should improve performance when using many modifiers that depend on global poses.

Two fields are added to Skeleton3D::Bone:

  • nested_set_offset: defines the position within the nested set[^1]
  • nested_set_span: defines the size of the subtree (the bone itself and all descendants)

[^1]: For imported scenes, nested_set_offset seems to be the same as the bone index. Presumably bones are already ordered this way. However, this may be different for manually created skeletons.

This shows the nested set values for a simplified skeleton:

nested_set

How it works

Skeleton3D::bone_global_pose_dirty is an array containing a flag for each bone indicating whether the global pose needs to be recalculated. It can be indexed with Bone::nested_set_offset, followed by the descendant bones.

When changing the pose of a bone, the dirty flags of its subtree need to be set to indicate that the global poses of the bone and its descendants need to be recalculated. This is done by setting all flags in the range from Bone::nested_set_offset with length Bone::nested_set_span to true. This has some overhead[^2]. But it can be skipped if the bone is already dirty at index Bone::nested_set_offset. Because if a global pose of a bone is dirty, all of its descendants are dirty too.

[^2]: It could be implemented as some kind of bit field structure instead of an array, if it turns out that performance is a problem at that point.

Getting the global pose with get_bone_global_pose

When requesting the global pose of a bone, only the global poses of its parents that are dirty are recalculated and the dirty flag is cleared. In the best case (if the parent global poses have already been calculated) only a few global poses need to be recalculated.

Recalculating global poses with force_update_bone_children_transforms

By looping through Skeleton3D::nested_set_offset_to_bone_index it is possible to update the bone hierarchy without the need for recursion. By checking the dirty flag at the current index, only required bones are updated. The order also ensures, that parent bone poses are calculated before child bone poses.

Benchmarks

skeleton_3d.gd.zip

Test Before After
get_global_poses 12.67ms 12.61ms
set_all_poses 49.83ms 51.98ms
set_all_poses_reverse 87.56ms 89.03ms
set_and_get_global_poses_forward 747.6ms 27.8ms
set_and_get_global_poses_reverse 788.4ms 57.37ms
set_and_get_some_global_poses 29.64ms 15.22ms

Note: I ran the tests individually. Some values were way too high even on master when run in series.

TODO

  • [x] Improve performance of get_bone_global_pose
  • [x] Improve performance of force_update_bone_children_transforms
  • [x] Provide some performance tests
  • [x] Thread safety?

detomon avatar Sep 27 '24 13:09 detomon

Tested on Animation_test.zip.

  • Master: 70 fps
  • This PR: 75 fps

As for thread safe (as I understand it you are talking about using the thread_local keyword) it is multi-thread safe. Although I do not think that this can cause bugs, since the multi-threaded scene tree is not implemented.

Nazarwadim avatar Sep 30 '24 11:09 Nazarwadim

As for thread safe (as I understand it you are talking about using the thread_local keyword) it is multi-thread safe. Although I do not think that this can cause bugs, since the multi-threaded scene tree is not implemented.

I was wondering if there is a problem now that the logic is more complex. But if the scene tree is not multi-threaded then that's ok I guess.

detomon avatar Sep 30 '24 18:09 detomon

As far as I know multi-threaded scene tree is implemented. I'm unable to get you the details, but it was done in an earlier release disabled by default.

fire avatar Sep 30 '24 18:09 fire

As far as I know multi-threaded scene tree is implemented.

Right, the implementation is here tho, it would be cool in look in this for skeleton, anim player/tree multithreading improvements. https://github.com/godotengine/godot/pull/75901

Saul2022 avatar Oct 03 '24 18:10 Saul2022

Is this ready for review? I noticed it is still in draft stage but all the items are checked.

fire avatar Oct 03 '24 18:10 fire

Is this ready for review? I noticed it is still in draft stage but all the items are checked.

Yes, I forgot.

detomon avatar Oct 03 '24 18:10 detomon

Fixed conflicts.

detomon avatar Oct 12 '24 10:10 detomon

Thanks!

Repiteo avatar Oct 14 '24 19:10 Repiteo

@detomon Do you think this can be done to the general 3d node caching system too?

fire avatar Oct 23 '24 17:10 fire

@fire I'm not too familiar with how the Node3D hierarchy works. I will look into that.

detomon avatar Oct 25 '24 07:10 detomon

https://github.com/godotengine/godot/blob/master/scene/3d/node_3d.cpp#L38

fire avatar Oct 25 '24 08:10 fire