godot icon indicating copy to clipboard operation
godot copied to clipboard

Optimize RenderForwardClustered::_setup_render_pass_uniform_set by reducing Vector allocations

Open CrazyRoka opened this issue 1 year ago • 3 comments

Optimize RenderForwardClustered::_setup_render_pass_uniform_set by reducing Vector allocations

This PR optimizes the RenderForwardClustered::_setup_render_pass_uniform_set function by reducing unnecessary vector reallocations.

Previously, the function heavily relied on Vector::push_back to populate the uniforms vector, consuming 1.47% of CPU time according to Hotspot profiling. This PR addresses the issue by:

  • Replaced Vector with thread_local LocalVector<..>. It allows us to allocate memory once and then re-use it without additional allocations.
  • Replaced get_uniforms function with append_uniforms. New function is directly adding values to the collection instead of copying them between collections.

These changes significantly reduce the CPU overhead, bringing down the time spent in LocalVector::clear() to 0.148%. get_uniforms was also removed and reduced CPU from 0.668% to 0.03% in append_uniforms.

Attached are Hotspot screenshots demonstrating the performance improvement:

Before

image

After

image

Testing

Created empty 3D scene with Vsync turned off. Run in DEBUG mode without any logic and any object.

image

CrazyRoka avatar Jul 14 '24 21:07 CrazyRoka

Such a change, despite indeed making performance better, makes the code much less maintenable due to the hardcoded size and indices. The right way of addressing this would be the same in, for instance, https://github.com/godotengine/godot/blob/97b8ad1af0f2b4a216f6f1263bef4fbc69e56c7b/servers/rendering/rendering_device_graph.cpp#L1838-L1848.

Also, since I'm pretty ceratin there will be more cases like this, I'd suggest fixing a bunch (or all?) of them in a single PR.

RandomShaper avatar Jul 15 '24 07:07 RandomShaper

The right way of addressing this would be the same in, for instance,

I know nothing about it but should it actually be cleared and constantly resized if the size in known? Resizing and direct assignment will be much faster, won't it?

arkology avatar Jul 15 '24 08:07 arkology

The right way of addressing this would be the same in, for instance,

I know nothing about it but should it actually be cleared and constantly resized if the size in known? Resizing and direct assignment will be much faster, won't it?

LocalVector has the notion of capacity so it can be cleared without actually resizing the allocation. Vector doesn't provide such a feature, but it should be possible to change the APIs taking a Vector so they take a LocalVector instead. That would be the case of get_cache_vec() here. As long as it's not exposed, that should be safe. Also, LocalVector is more performant than Vector in that it doesn't have the overhead of copy-on-write semantics.

RandomShaper avatar Jul 15 '24 09:07 RandomShaper

Amended to resolve the requested style fixes.

akien-mga avatar Dec 02 '24 13:12 akien-mga

Thanks!

akien-mga avatar Dec 02 '24 14:12 akien-mga