godot icon indicating copy to clipboard operation
godot copied to clipboard

Godot 4.x is significantly slower than 3.5.1 in creating nodes

Open Overvault-64 opened this issue 1 year ago • 25 comments

Godot version

4.0.beta10

System information

Windows 10 and Android

Issue description

In my projects I often need to create UI at runtime, instantiating a lot of elements. This brought to my attention a relevant performance issue in Godot 4: when creating nodes, or instantiating them, it's about 4 times slower than Godot 3.

I've repeated the tests several times with my test projects making both versions instantiate a scene (a button containing 3 child nodes) different amount of times, from 1000 to 8000, and the result goes from more than 4 times to almost 4 times slower.

Godot 4 crashed when instantiating more than 8000 of those scenes, while Godot 3 handled more than 10000 comfortably. My hardware was below 30% workload all the time.

I've also tried creating the button at runtime and it makes no significant difference compared to instantiating a pre-made scene.

image

image

This wouldn't be a real-world problem for most developers (except me, I guess) if it wasn't way amplified on mobile. My mobile test device required 6-7 times more time to complete the task on both versions, leading to an unhealthy 21 seconds with 8000 scenes on the Godot 4 build. The Godot 4/Godot 3 lag ratio on mobile is quite the same as on pc.

Minimal projects attached

Steps to reproduce

Start the projects

Minimal reproduction project

node.test.zip

Overvault-64 avatar Jan 10 '23 20:01 Overvault-64

Related to https://github.com/godotengine/godot/issues/61929.

Are you using an official engine build in both 3.x and 4.0.beta (i.e. with the same optimizations)?

Calinou avatar Jan 11 '23 02:01 Calinou

Doesnt that mean that the engine is capable to handle more nodes with less of a ressource uptake or spike thereof? Isnt it possible to adjust these things in the profiler? I think Godot has one already. So that needs more thorough observation and testing. Plus, running them alongside is not indicative. Some CPUs are capable of repeating workloads when they understood them I heard... they dont seem to be in circulation. Can people change that. Sorry for being so unfactual. Yes, optimization does not matter if you can adjust the thread size, but that could limit mobile phone performance, I mean Android specificaly, because you are locking yourself out of certain devices that dont have the RAM to deal with the work load, even with blast processing and spiked CPUs, which they all are in the Mobile hemisphere, so clocktimes must be an issue for the new release, and if so, why are they not? If its all software anyways and we are reading from self updating text files then thats Linux fixing that. I feel like this is way more interessting if you think closer than with an open eye, BUT heres the deal... it just takes longer because people want to have more nodes in their projects created by hand or while running, right? so that takes more power on average to handle because thats how streams and data work, once you turn it into something closer to a machine, you can rearrange it and it will still present the same way to the user with less data taken up in all regards. Sounds crazy its the truth, not advised for smaller data sizes, like a thousands buttons... but you have a keen eye... good job! im not an expert! but that seems to be the general thing when it comes to handeling LOTS of things at THE SAME time. everything else is just an illusion of that and its handled one by one, which doesnt create the necessary behavior anyway.

wardPlaced avatar Jan 11 '23 11:01 wardPlaced

Related to #61929.

Are you using an official engine build in both 3.x and 4.0.beta (i.e. with the same optimizations)?

Yes

Overvault-64 avatar Jan 11 '23 11:01 Overvault-64

Still slow in beta13

image

Overvault-64 avatar Jan 19 '23 09:01 Overvault-64

4.0.1 (stable) image

Overvault-64 avatar Mar 24 '23 15:03 Overvault-64

Can you reproduce this with a self-compiled editor build with the production=yes module_text_server_advanced_enabled=no module_text_server_fb_enabled=yes SCons options? This uses a simpler and faster TextServer that has advanced features disabled (no right-to-left or complex scripts).

Calinou avatar Mar 24 '23 17:03 Calinou

@Calinou good point. While the issue title describes "nodes" in general, this benchmark uses UI nodes, which aren't exactly simple.

Riteo avatar Mar 25 '23 12:03 Riteo

@Calinou I don't have a compile environment set up, but @Riteo 's comment made me think that I can benchmark different kind of nodes and look at the results. This way I could see which node types are harder on the engine and maybe identify a common cause. Makes sense?

Overvault-64 avatar Mar 26 '23 16:03 Overvault-64

but @Riteo 's comment made me think that I can benchmark different kind of nodes and look at the results. This way I could see which node types are harder on the engine and maybe identify a common cause. Makes sense?

You can try to do that, but the best way to isolate the bottleneck is to switch TextServers as I mentioned. I get a strong feeling the slowness is due to text shaping, not node creation. Text shaping in 4.0 regularly comes up as one of the most demanding operations when I look at results in a C++ profiler (the editor profiler won't show it).

You can also use a C++ profiler on a debug build of the engine.

Calinou avatar Mar 26 '23 16:03 Calinou

@Calinou I hope I did it right Here are the results but I don't know how to read them

I've used the godot-4.0-editor-debug-windows-msvc2022 build

Overvault-64 avatar Mar 26 '23 17:03 Overvault-64

4.1-beta1 image

Overvault-64 avatar Jun 08 '23 15:06 Overvault-64

4.1-stable (still the same exact hardware and configuration) image

Overvault-64 avatar Jul 08 '23 14:07 Overvault-64

4.2.beta1 image

Overvault-64 avatar Oct 13 '23 08:10 Overvault-64

A hunch: What if you disable advanced text server when compiling?

Zireael07 avatar Oct 13 '23 09:10 Zireael07

A hunch: What if you disable advanced text server when compiling?

I can't compile :(

Overvault-64 avatar Oct 13 '23 23:10 Overvault-64

v4.3.dev1.official [9d1cbab1c] image

Overvault-64 avatar Dec 31 '23 11:12 Overvault-64

I am seeing something similar but unlike @Overvault-64 I don't have a 3.x version to compare with.

Calling Node.Instantiate<Control>() 20x takes a considerable amount of time on Android. The game does not freeze but it can clearly be seen that it takes a couple of seconds for the UI to render.

Running 4.3-dev5.mono

https://github.com/godotengine/godot/assets/11413364/d06f1135-744f-4f9c-8ca6-796480e447a8

In the video, after pressing the button, 20 other buttons will be instantiated (here I left them as simple as possible and instantiate Controls instead, so none of those buttons are actually visible). Notice that the title of the next menu (5x5) takes a couple of seconds to appear.

duarteroso avatar Mar 28 '24 15:03 duarteroso

Still present even in 4.2.2. To mitigate this I have created queues for each instance I need to load a bunch of elements at once, and parse one instantiation per frame.

Veradictus avatar Apr 05 '24 06:04 Veradictus

(scene as PackedScene).instantiate() and (node as Node).add_child() (also contain Label nodes) are slow but editor profiler can't catch it. Keep it in mind.

luckyabsoluter avatar Apr 17 '24 05:04 luckyabsoluter

Adding more information for if it's useful.

On version 4.2.2 (stable) and 4.3 (dev5), using C#, seems to slow down the instance of a PackedScene if it contains either Shader Materials or Particle Process Material with "Resource Local To Scene" toggled on.

The difference in my tests it's ~x1.5 times slower with Shader Materials and ~x10 times slower in the case of Particle Process Materials.

Timings are without adding as child, only instantiating the PackedScene:

Node2D + Sprite2D without any Shader Material Finished instantiating 10000 nodes: res://scene_no_materials.tscn Total time 99 ms. 992842 ticks

Node2D + Sprite2D, with a Shader Material, Resource Local To Scene turned OFF Finished instantiating 10000 nodes: res://scene_with_material.tscn Total time 73 ms. 739936 ticks

Node2D + Sprite2D, with a Shader Material, Resource Local To Scene turned ON Finished instantiating 10000 nodes: res://scene_with_instantiated_material.tscn Total time 133 ms. 1335912 ticks

Node2D + GPUParticles2D, with a ParticleProcessMaterial, Resource Local To Scene turned OFF Finished instantiating 10000 nodes: res://scene_with_particles.tscn Total time 78 ms. 787835 ticks

Node2D + GPUParticles2D, with a ParticleProcessMaterial, Resource Local To Scene turned ON Finished instantiating 10000 nodes: res://scene_with_instantiated_particles.tscn Total time 782 ms. 7829485 ticks

merksk8 avatar Apr 18 '24 09:04 merksk8

On version 4.2.2 (stable) and 4.3 (dev5), using C#, seems to slow down the instance of a PackedScene if it contains either Shader Materials or Particle Process Material with "Resource Local To Scene" toggled on.

This is unrelated to the issue mentioned here, as the cause is entirely different.

In this situation, a shader needs to be compiled every time the PackedScene is instanced, because the Shader instance is unique. You need to ensure the shader resource is shared across instances somehow. Also, ParticleProcessMaterial needs more time to compile than a bare ShaderMaterial as it's much more complex (it's a premade ShaderMaterial with dozens of uniforms and potentially hundreds of lines of code).

Excessive shader amounts will also slow down drawing because of the high number of state changes/draw calls required.

Calinou avatar Apr 18 '24 14:04 Calinou

This is unrelated to the issue mentioned here, as the cause is entirely different.

I see, sorry then for mixing the topic!

And also thanks for the clear explanation, I'll have that in mind and will use it only when it's really really needed.

merksk8 avatar Apr 19 '24 06:04 merksk8

@Calinou Any updates on this? Do you need help testing/debugging it? I'm not familiar with the Godot development process but can spend some time trying (before the sun comes back 🌞 )

duarteroso avatar Apr 30 '24 14:04 duarteroso

Any updates on this? Do you need help testing/debugging it?

I suggest testing what I mentioned here: https://github.com/godotengine/godot/issues/71182#issuecomment-1483147660

Make sure to compile with release optimizations as well (production=yes), so that the result is more comparable with official builds, and use MinGW instead of MSVC if targeting Windows (as that's what official binaries use).

Calinou avatar Apr 30 '24 15:04 Calinou

Any updates on this? Do you need help testing/debugging it?

I suggest testing what I mentioned here: #71182 (comment)

Make sure to compile with release optimizations as well (production=yes), so that the result is more comparable with official builds, and use MinGW instead of MSVC if targeting Windows (as that's what official binaries use).

In my use case there are no labels/texts involved. Simply by instantiating a bunch of scenes with only a Control gives me this significant delay.

Will still try to run from source and take it from there 🚀

duarteroso avatar Apr 30 '24 19:04 duarteroso