Long loading time with large shapes
Context: Version: 3bc77ac5f5bcfd8de3042bfe9631514bdd9c024d Godot version: 4.0.2
Issue:
I encounter long loading times, from 5 to 15 seconds depending on the computer, when loading a scene with large shapes (mainly SS2D_Shape_Closed). This happens when opening the scene in the editor or when loading the scene in game.
I have about ten shapes with a total of 1024 points spread over an area of 32,000x32,000px. Most of them have curved edges. All shapes have a shape material and most of them have multiple textures (edge or fill).
If I remove the shape material from all these shapes, I no longer have a performance issue.
Here is an example of one of the ten shapes:
I can upload more configuration screenshots if needed.
How to reproduce:
I have made a branch on my project and clean up the code for a minimal reproduction: https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance You can start the project and click "Load level". The problematic scene can be found at "res://src/level/level.tscn".
Suggestions
Is there anything I can do differently on my side to improve performance?
Smart Shape seems to do some calculations when loading, could these calculations be saved/cached?
Thanks!
EDIT: I'm ready to work on a PR if needed, but I would need some help to go in the right direction (I don't know how the addon works internally)
Here is the profiler graph when loading the scene, we can see the Process time at 4424ms which cause the freeze when loading the scene.
Script functions do not take much time:

Going step by step, I see that it is the _build_edge_with_material method of shape_base.gd that is taking a long time. It is called by shape_base.gd via _process > _on_dirty_update > cache_edges (shape_closed.gd) > _build_edges > _build_edge_with_material_thread_wrapper > _build_edge_with_material :
https://github.com/SirRamEsq/SmartShape2D/blob/3bc77ac5f5bcfd8de3042bfe9631514bdd9c024d/addons/rmsmartshape/shapes/shape_base.gd#L1440-L1632
IIRC this topic came up on discord a while a go but I think nobody further investigated the problem.
My guess is that it might be related to threading in _build_edges() in shape_base.gd.
https://github.com/SirRamEsq/SmartShape2D/blob/3bc77ac5f5bcfd8de3042bfe9631514bdd9c024d/addons/rmsmartshape/shapes/shape_base.gd#L1054-L1063
Can you try to add a print(index_maps.size()) at line 1054 and report how many index maps it processes?
It could be possible that it spins up an unhealthy amount of threads. I'm not sure how Godot 4 handles synchronization, but it could also be that threads block each other, due to accessing shared resources.
You can also try to comment out the threading related lines (1056 - 1060) and add edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0)) instead, so that it computes the shape on the main thread instead of spawning multiple worker threads.
Thanks for your response, I'll test this out asap (probably tomorrow) and get back to you with the result.
If this solves the problem, should I submit a PR that removes the threading? Or should I redo the threads management? I'm rather new contributing on GitHub but I would be glad to help ☺️
If it actually solves the problem, we can discuss with the other contributors, whether to remove threaded computation or refactor it. This in turn requires further testing where the culprit lies exactly. So, one step after another :)
I was able to test what you suggested @mphe :
By adding the print(index_maps.size()) it tells me that there are 3 maps for the shape I shared above. Each time the number of maps corresponded to the numbers of SS2D_Material_Edge_Metadata in the shape, so it seems coherent for me.
Also, replacing the threads by a direct call like below, the loading time is worse, I go from 5 seconds to 10:
for index_map in index_maps:
edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0))
Ok, good to know that threading is not a problem.
Could you please provide an actual minimal reproduction project and not a a whole game project?
I cleaned up the next branch to leave only the necessary (it was not the case at the beginning): https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance
it is no longer a complete game, only two scenes are left: a starting scene with just a button to load the level scene. In these two, I left the bare minimum, only the smart shapes remains in the level.
Will it works for you?
Ah, I'm sorry, I cloned the repo but forgot to switch the branch. Yes, it's fine.
I'll try to look into it in the next days.
Around 3 seconds here (main branch).
Instancing level and drawing first frame took 2854 ms
Here's the benchmark code I used:
# SceneTransition
extends CanvasLayer
@onready var anim_player = $AnimationPlayer
func change_scene(scene_path: String) -> void:
# anim_player.play("fade")
# await anim_player.animation_finished
var start_time = Time.get_ticks_msec()
get_tree().change_scene_to_file(scene_path)
# anim_player.play_backwards("fade")
# await anim_player.animation_finished
await RenderingServer.frame_post_draw
var end_time := Time.get_ticks_msec()
print("Instancing level and drawing first frame took %s ms" % [end_time - start_time])
I looked into it and timed various parts of the process and the case is clear. The edge generation code is simply an utter mess.
Lots of data is regenerated over and over again instead of caching or reusing the results, e.g. get_vertices(), get_all_point_keys(), get_tesselated_points(), and much more.
Some code could be written in O(n) but is implemented to run in O(n²). For example, get_vertex_idx_from_tessellated_point(), which tries to map a tesselated point to the corresponding, non-tesselated point. This function could be implemented to compute all matches upfront in O(n). Instead, it is called for every tesselated point, increasing the runtime cost to O(n²).
Then I found should_flip_edges(), which alone takes up to 1ms in this test case! This function is also called for every tesseleated point even though the result is constant during edge generation. Now imagine a shape with 1000 tesselated points. You will get 1s of loading time just because of this.
After pulling that function call out of the loop, it reduced the overall time consumption by ~30-40%.
And these are just some examples.
Unfortunately, fixing the performance would require carefully refactoring and optimizing large parts of the generation process, which is not trivial and requires quite a bit of time and effort.
So, it will be more of a longer-term goal to fix the performance.
However, you can get 30-40% loading time reduction for free due to the should_flip_edges() workaround.
I opened a PR here: #120.
Thank you for your time and deep analysis @mphe.
30-40% is a good first step to improve the overall performances. It might be interesting to merge it and then iterate to improve step by step.
What can I do to help? I don't know how the edge generation code works but I'm ready to learn!
What can I do to help? I don't know how the edge generation code works but I'm ready to learn!
Read the source and try to roughly understand how the plugin works. plugin.gd contains most of the UI functionality, shape_base contains most of the shape related functionality. _on_dirty_update() regenerates the shape.
Follow every function call and you will eventually see how some stuff is computed over and over again.
Also try to get a grasp of what is actually going on and check if the implementation could be improved.
That's basically it.
If you find something that can be optimized, try to find a way to do so. Usually this is not that simple and might require some refactoring.
Add timers to identify the parts that consume the most time and start there. I have a simple helper class for time measurements here. Feel free to use it or bring your own.
If you make changes, test them and ensure the unit and integration tests pass.
Good luck and have fun!
Nice! Should also improve editing performance which hasn't been stellar so far. Also wanted to mention that the mesh could be cached and saved in scene file during generation and simply rendered at runtime. Are there good reasons not to do it?
Isn't that what @remorse107 has been working on?
I believe so. There were some issues with this approach, but if it proves to be a feasible option, it might help address the loading time problem (although it won't resolve the performance issues with the editor).
I initiated another pull request with some other performance improvements: https://github.com/SirRamEsq/SmartShape2D/pull/122
During my tests, I noticed an additional improvement of ~25%.
Note: this is a draft PR, I don't have much time to work on it at the moment, I won't be able to complete this performance refactoring for now.
Is this issue still relevant? Could somebody who had issues with long loading times test the latest version from GitHub?
I've just tested the test project from top post with master and master + #122.
Loading times became better, but are still 2-4 seconds (on my machine), for a rather simple scene.
#122 adds a noticable speed boost, but @martinboue mentioned there is a failing unit test, so it can't be merged, yet.
Editing performance is still abysmal.
So, the issue still persists I'd say.
Hi, thanks for bringing attention to this issue.
I'm no longer working with SmartShape2D or Godot. The PR is a draft, I think it needs little adjustments but serious tests/checks from an expert. I lack expertise and time to do that.
From what I remember, the PR is "almost stable", all tests pass. The mentioned failing tests are related to a change suggestion that is not included in the PR changes, just a comment suggestion if someone wants to explore the improvement.
Ah alright, thank you for the work you put into this! Guess we can salvage your PR and investigate your ideas further.