godot
godot copied to clipboard
Optimize AnimationMixer blend process
This PR is created to optimize the AnimaionMixer _process_animation.
void AnimationMixer::_process_animation(double p_delta, bool p_update_only) {
_blend_init();
if (_blend_pre_process(p_delta, track_count, track_map)) {
_blend_capture(p_delta);
_blend_calc_total_weight();
_blend_process(p_delta, p_update_only);
_blend_apply();
_blend_post_process();
emit_signal(SNAME("mixer_applied"));
};
clear_animation_instances();
}
Benchmarking methods:
I made some benchmarks of how long each of these methods takes for one 3D model, using animation_tree.
Some explanations to understand the results. Units of measurement: usec. is_process = _blend_pre_process weight = _blend_calc_total_weight
Master:
You can see here that the _blend_process, _blend_pre_process and _blend_calc_total_weight methods take the most time.
I will show the results that are in this PR.
You can see that blend_process has improved by 30%, blend calc_total_weight has improved by 15% and _blend_pre_process has improved by 50%.
Real project benchmarks:
Master: Project FPS: 56 (17.85 mspf) Project FPS: 56 (17.85 mspf) Project FPS: 56 (17.85 mspf) Project FPS: 56 (17.85 mspf) Project FPS: 55 (18.18 mspf) Project FPS: 55 (18.18 mspf)
Current PR: Project FPS: 68 (14.70 mspf) Project FPS: 69 (14.49 mspf) Project FPS: 66 (15.15 mspf) Project FPS: 66 (15.15 mspf) Project FPS: 68 (14.70 mspf)
Here you can see a pretty good + 22%. Note that #92554 also improves animation performance, which with this PR adds 40% to performance.
How to test:
Here #92554 in the benchmark section is Animation_test.zip project. After opening, fps will be output to the console.
What was done:
Animation:
- For enums, the size was reduced from 4 to 1 byte.
- Added the get_tracks method.
AnimationTree:
- I made track_map a pointer in order not to copy maps.
AnimationMixer:
- The first is to use getptr instead of has + operator[].
- The second is to use
int count = a->get_track_count();In order to store in a register the number of iterations. - Used the method already created in animation.h to take the array.
const Vector<Animation::Track *> tracks = a->get_tracks(); - In the method
post_process_key_valueI cache whether there is GDVIRTUAL_CALL. That is, there will be only 1 check per _blend_process call.
Probably closes: #92693
Your test project is not very suitable for such a test, due to the dynamic camera, and a large number of meshes, I used the project from a recent discussion (#92724), which uses 301 skeletons, and it really showed excellent results!
4.3 beta1 (master): FPS 47-48 This PR: FPS 56-58
I'll attach a project that is great for testing this PR: anm.zip
Let's see what @TokageItLab says about the code
I would say that by itself it won’t be enough but with the pr you mentioned it might close it as it reduces it cost a lot more(40% as you stated when combining it )
Either way this is great job and hopefully it gets merged quick in 4.4 along with your other great optimizations pr’s.
Either way this is great job and hopefully it gets merged quick in 4.4 along with your other great optimizations pr’s.
There are no compatibility problems or any major innovations here, this is just an optimization of what already exists. So I think this would work well for 4.3 as well
Animation does not support multi-core
@WOLFxxxxxx It has nothing to do with this PR you are talking about. This PR is an optimization for several allocations in the blending process and does not address multithreading. That will have to be done in another PR. Please confine the discussion here to the topic of the implementation that this PR does.
@reduz I think it's time to merge it with 4.4, no conflicts related to this PR have been found
Thanks!