godot icon indicating copy to clipboard operation
godot copied to clipboard

Batching - Add MultiRect command

Open lawnjelly opened this issue 1 year ago • 3 comments

Large groups of similar rects can be processed more efficiently using the MultiRect command. Processing common to the group can be done as a one off, instead of per rect.

Adds the new API to VisualServerCanvas, and uses the new functionality from Font, BitmapFont, DynamicFont and TileMap, via the VisualServerCanvasHelper class.

Can be switched on and off with project setting rendering/batching/options/use_multirect (defaults to on), just in case of regressions.

Some measurements from a test benchmark

16 fps - without batching 190 fps - with batching 425 fps - with batching + MultiRect

Test benchmark included below (reason these are faster seems to be the smaller font doesn't support all the chinese characters used in the benchmark, to save on the download size) 43 fps - without batching 450 fps - with batching 1000 fps - with batching + MultiRect

Notes

  • Increases performance in benchmarks of 2.4x over raw batching (large amounts of text, small fill rate)
  • Real world performance increases will probably be a lot more modest, but seems worth doing as the PR is relatively simple, and opens up further avenues for optimization
  • Works through the fonts so will apply to most text using the Font::draw() command (passing a string, rather than by char)
  • Also works for tilemap quadrants, there is special helper for caching and filling multiple MultiRects within a quadrant.
  • As discussed with @clayjohn , could potentially be very interesting to add to 4.x, as 4.x has no batching especially with vulkan. Although the specifics would need to be different - the font access may be different, and it would need a vulkan / GL backend for rendering the new command.

Test benchmark

batch_test_text.zip Press left button 10 times. Test with batching off / on, and multirect off on.

lawnjelly avatar Nov 21 '22 15:11 lawnjelly

I benchmarked this:

OS: Fedora 36 (KDE + KWin with compositing disabled) CPU: Intel Core i7-6700K @ 4.4 GHz GPU: AMD Radeon RX 6900 XT

GLES2 is used (project default). A non-editor release build with LTO enabled is used for both "Before" and "After (this PR)".

I've also tested GLES3 and it showed similar performance characteristics.

Type Before After (this PR) Relative performance
120×120 window (empty) 10353 FPS 10967 FPS 1.06×
120×120 window (full of text) 1813 FPS 2747 FPS 1.52×
3840×2160 fullscreen (empty) 9501 FPS 9913 FPS 1.04×
3840×2160 fullscreen (full of text)[^1] 93 FPS 256 FPS 2.75×

[^1]: Until text reaches the bottom of the screen, so there is much more text here (as the test project uses the disabled stretch mode).

Warning Multirect is disabled in the test project's settings, so make sure to enable it in the project settings first.

Calinou avatar Dec 02 '22 00:12 Calinou

Thanks for comments. Agree on the threading, I'll add the suggestions to the PR when I'm back at home next week. 👍

lawnjelly avatar Dec 02 '22 08:12 lawnjelly

Stack friendly fixed size MultiRect

After trying an approach using a thread safe pool of MultiRects for fonts etc, I decided I preferred the approach of avoiding the mutex and pool by just using fixed max size MultiRect (helper class) and allowing allocating on the stack, which is a lot cheaper than the dynamic allocations for both the MultiRect itself and the rect and sources lists.

The current size on the stack would be approx: (Rect (16 bytes) + Source (16 bytes) ) * max_rects (2048) = 64Kb + a little for state.

This should be as most modern stacks are 1Mb, (or 512Kb to be safe), and the MultiRects are unlikely to be used in a recursive fashion like this.

So this new version does most of the filling of MultiRects using the MultiRect helper class, which is fixed in size.

The exception is tilemaps:

Tilemaps

I've slightly altered the mechanism for tilemaps to make it a little more optimal. Instead of using the same general MultiRect cache, it uses a single set of caches (and a single mutex so that only one tilemap quadrant can be filled at a time). However the difference is that the tilemap MultiRect caches can be filled out of order. I.e. you can fill cache 0, start cache 1, then return to filling 0 if a similar set of rects is detected in the quadrant (and there is no overlap via an overlap test).

The overlap test is a slight expense, but this will tend to be done as a one off rather than every frame as in the batching, so is a lot more efficient. And if a quadrant contains a lot of changes between swaps and transposes etc, then these will now be efficiently transcribed into a small number of MultiRects.

The MultiRect system with add_char_ex() currently doesn't act for user created nodes that use add_char(), mainly to keep backward compatibility and keep the API the same. This shouldn't be a big problem as it is reasonably fast even if MultiRect is not used. Although we might consider changing the API here if we roll out to 4.x, as there is no batching in Vulkan, so the potential benefits will be greater.

Other Canvas Node types

Although @bruvzg suggested there may be a few more canvas types using add_char(), it turns out most use the FontDrawer helper class. As FontDrawer has been converted to use MultiRect, all the canvas classes that use it get MultiRect functionality for free.

This includes:

  • ItemList
  • Label
  • LineEdit
  • RichTextLabel
  • TextEdit

lawnjelly avatar Dec 06 '22 17:12 lawnjelly

Thanks!

akien-mga avatar Apr 17 '23 15:04 akien-mga