SmartShape2D Long loading time with large shapes

Context: Version: 3bc77ac5f5bcfd8de3042bfe9631514bdd9c024d Godot version: 4.0.2

Issue:

I encounter long loading times, from 5 to 15 seconds depending on the computer, when loading a scene with large shapes (mainly SS2D_Shape_Closed). This happens when opening the scene in the editor or when loading the scene in game.

I have about ten shapes with a total of 1024 points spread over an area of 32,000x32,000px. Most of them have curved edges. All shapes have a shape material and most of them have multiple textures (edge or fill).

If I remove the shape material from all these shapes, I no longer have a performance issue.

Here is an example of one of the ten shapes: I can upload more configuration screenshots if needed.

How to reproduce:

I have made a branch on my project and clean up the code for a minimal reproduction: https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance You can start the project and click "Load level". The problematic scene can be found at "res://src/level/level.tscn".

Suggestions

Is there anything I can do differently on my side to improve performance?

Smart Shape seems to do some calculations when loading, could these calculations be saved/cached?

Thanks!

EDIT: I'm ready to work on a PR if needed, but I would need some help to go in the right direction (I don't know how the addon works internally)

May 05 '23 18:05 martinboue

Here is the profiler graph when loading the scene, we can see the Process time at 4424ms which cause the freeze when loading the scene. Script functions do not take much time:

Going step by step, I see that it is the _build_edge_with_material method of shape_base.gd that is taking a long time. It is called by shape_base.gd via _process > _on_dirty_update > cache_edges (shape_closed.gd) > _build_edges > _build_edge_with_material_thread_wrapper > _build_edge_with_material :

https://github.com/SirRamEsq/SmartShape2D/blob/3bc77ac5f5bcfd8de3042bfe9631514bdd9c024d/addons/rmsmartshape/shapes/shape_base.gd#L1440-L1632

May 06 '23 07:05 martinboue

IIRC this topic came up on discord a while a go but I think nobody further investigated the problem.

My guess is that it might be related to threading in _build_edges() in shape_base.gd. https://github.com/SirRamEsq/SmartShape2D/blob/3bc77ac5f5bcfd8de3042bfe9631514bdd9c024d/addons/rmsmartshape/shapes/shape_base.gd#L1054-L1063

Can you try to add a print(index_maps.size()) at line 1054 and report how many index maps it processes? It could be possible that it spins up an unhealthy amount of threads. I'm not sure how Godot 4 handles synchronization, but it could also be that threads block each other, due to accessing shared resources.

You can also try to comment out the threading related lines (1056 - 1060) and add edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0)) instead, so that it computes the shape on the main thread instead of spawning multiple worker threads.

May 06 '23 11:05 mphe

Thanks for your response, I'll test this out asap (probably tomorrow) and get back to you with the result.

If this solves the problem, should I submit a PR that removes the threading? Or should I redo the threads management? I'm rather new contributing on GitHub but I would be glad to help ☺️

May 06 '23 14:05 martinboue

If it actually solves the problem, we can discuss with the other contributors, whether to remove threaded computation or refactor it. This in turn requires further testing where the culprit lies exactly. So, one step after another :)

May 06 '23 15:05 mphe

I was able to test what you suggested @mphe :

By adding the print(index_maps.size()) it tells me that there are 3 maps for the shape I shared above. Each time the number of maps corresponded to the numbers of SS2D_Material_Edge_Metadata in the shape, so it seems coherent for me.

Also, replacing the threads by a direct call like below, the loading time is worse, I go from 5 seconds to 10:

for index_map in index_maps:
	edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0))

May 07 '23 13:05 martinboue

Ok, good to know that threading is not a problem.

Could you please provide an actual minimal reproduction project and not a a whole game project?

May 07 '23 17:05 mphe

I cleaned up the next branch to leave only the necessary (it was not the case at the beginning): https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance

it is no longer a complete game, only two scenes are left: a starting scene with just a button to load the level scene. In these two, I left the bare minimum, only the smart shapes remains in the level.

Will it works for you?

May 07 '23 19:05 martinboue

Ah, I'm sorry, I cloned the repo but forgot to switch the branch. Yes, it's fine.

I'll try to look into it in the next days.

May 07 '23 19:05 mphe

Around 3 seconds here (main branch).

Instancing level and drawing first frame took 2854 ms

Here's the benchmark code I used:

# SceneTransition
extends CanvasLayer

@onready var anim_player = $AnimationPlayer


func change_scene(scene_path: String) -> void:
#	anim_player.play("fade")
#	await anim_player.animation_finished
	var start_time = Time.get_ticks_msec()
	get_tree().change_scene_to_file(scene_path)
#	anim_player.play_backwards("fade")
#	await anim_player.animation_finished
	await RenderingServer.frame_post_draw
	var end_time := Time.get_ticks_msec()
	print("Instancing level and drawing first frame took %s ms" % [end_time - start_time])

May 07 '23 20:05 limbonaut

I looked into it and timed various parts of the process and the case is clear. The edge generation code is simply an utter mess. Lots of data is regenerated over and over again instead of caching or reusing the results, e.g. get_vertices(), get_all_point_keys(), get_tesselated_points(), and much more. Some code could be written in O(n) but is implemented to run in O(n²). For example, get_vertex_idx_from_tessellated_point(), which tries to map a tesselated point to the corresponding, non-tesselated point. This function could be implemented to compute all matches upfront in O(n). Instead, it is called for every tesselated point, increasing the runtime cost to O(n²). Then I found should_flip_edges(), which alone takes up to 1ms in this test case! This function is also called for every tesseleated point even though the result is constant during edge generation. Now imagine a shape with 1000 tesselated points. You will get 1s of loading time just because of this. After pulling that function call out of the loop, it reduced the overall time consumption by ~30-40%. And these are just some examples.

Unfortunately, fixing the performance would require carefully refactoring and optimizing large parts of the generation process, which is not trivial and requires quite a bit of time and effort. So, it will be more of a longer-term goal to fix the performance. However, you can get 30-40% loading time reduction for free due to the should_flip_edges() workaround. I opened a PR here: #120.

May 12 '23 12:05 mphe

Thank you for your time and deep analysis @mphe.

30-40% is a good first step to improve the overall performances. It might be interesting to merge it and then iterate to improve step by step.

What can I do to help? I don't know how the edge generation code works but I'm ready to learn!

May 12 '23 15:05 martinboue

What can I do to help? I don't know how the edge generation code works but I'm ready to learn!

Read the source and try to roughly understand how the plugin works. plugin.gd contains most of the UI functionality, shape_base contains most of the shape related functionality. _on_dirty_update() regenerates the shape. Follow every function call and you will eventually see how some stuff is computed over and over again. Also try to get a grasp of what is actually going on and check if the implementation could be improved. That's basically it. If you find something that can be optimized, try to find a way to do so. Usually this is not that simple and might require some refactoring. Add timers to identify the parts that consume the most time and start there. I have a simple helper class for time measurements here. Feel free to use it or bring your own. If you make changes, test them and ensure the unit and integration tests pass. Good luck and have fun!

May 12 '23 17:05 mphe

Nice! Should also improve editing performance which hasn't been stellar so far. Also wanted to mention that the mesh could be cached and saved in scene file during generation and simply rendered at runtime. Are there good reasons not to do it?

May 12 '23 18:05 limbonaut

Isn't that what @remorse107 has been working on?

May 12 '23 23:05 mphe

I believe so. There were some issues with this approach, but if it proves to be a feasible option, it might help address the loading time problem (although it won't resolve the performance issues with the editor).

May 13 '23 02:05 limbonaut

I initiated another pull request with some other performance improvements: https://github.com/SirRamEsq/SmartShape2D/pull/122

During my tests, I noticed an additional improvement of ~25%.

Note: this is a draft PR, I don't have much time to work on it at the moment, I won't be able to complete this performance refactoring for now.

May 21 '23 13:05 martinboue

Is this issue still relevant? Could somebody who had issues with long loading times test the latest version from GitHub?

Jan 17 '24 18:01 limbonaut

I've just tested the test project from top post with master and master + #122. Loading times became better, but are still 2-4 seconds (on my machine), for a rather simple scene. #122 adds a noticable speed boost, but @martinboue mentioned there is a failing unit test, so it can't be merged, yet. Editing performance is still abysmal. So, the issue still persists I'd say.

Jan 18 '24 18:01 mphe

Hi, thanks for bringing attention to this issue.

I'm no longer working with SmartShape2D or Godot. The PR is a draft, I think it needs little adjustments but serious tests/checks from an expert. I lack expertise and time to do that.

From what I remember, the PR is "almost stable", all tests pass. The mentioned failing tests are related to a change suggestion that is not included in the PR changes, just a comment suggestion if someone wants to explore the improvement.

Jan 18 '24 22:01 martinboue

Ah alright, thank you for the work you put into this! Guess we can salvage your PR and investigate your ideas further.

Jan 19 '24 19:01 mphe