scatter icon indicating copy to clipboard operation
scatter copied to clipboard

Reducing long loading times

Open SlashScreen opened this issue 2 years ago • 45 comments

I have a world with a large amount of props from scatter. When my scene gets loaded, through the editor or in-game, it causes Godot to freeze for a while as all the scatter props are recalculated. Is there a way I can avoid this? I have force rebuild off, and I am using a ScatterCache holding all of the transform information, but it seems to continue to attempt to recalculate everything on load, slowing my editor and the game down massively for a brief time. Is there any way I can avoid this?

SlashScreen avatar Aug 11 '23 02:08 SlashScreen

Also, despite force reload turned off, the props still take some time to recalculate when the scene is opened in the editor.

SlashScreen avatar Aug 12 '23 07:08 SlashScreen

I believe this is something @Norodix fixed a while ago in 95bf5fa8fb84da5dde0a9785f6cc5737e2e61f0a Which version of Godot and which version of the add-on are you using?

HungryProton avatar Aug 12 '23 12:08 HungryProton

Godot: 4.1.1 stable. Scatter: 4.0.0, on the assetlib. There is a possibility that I simply am not understanding how protonscatter is working, but when I first open the scene, the scatter nodes show the little loading symbol, and behave as expected.

SlashScreen avatar Aug 12 '23 21:08 SlashScreen

After doing some experimenting, it seems the only action that's being done on runtime is the cache restore, which is taking roughly 30 seconds. Is there anything I can do to speed it up? I understand not being able to multithread it, since it adds things to the scene tree.

SlashScreen avatar Aug 13 '23 11:08 SlashScreen

Perhaps instead of having several thousand trees, I instead have a fewer number of premade tree clusters.

SlashScreen avatar Aug 13 '23 11:08 SlashScreen

Do you mind sharing more info? Your CPU model, how many scatter nodes and how many objects are being scattered ?

For comparison, restoring half a million instances from a cache takes roughly 3 seconds on a 4 years old Ryzen 3700X, and I suspect most of that time comes from reading the cache from my hard drive.

If you only have a few thousands trees, it should not take that long.

HungryProton avatar Aug 13 '23 11:08 HungryProton

Here's my info:

  • CPU: Ryzen 7 5800X
  • Trees: Roughly 2500 copies of a number of different prefab tree/bush combinations, colliders on
  • Roughly 5500 grass clumps, instanced
  • Roughly 5000 dead branches, instanced
  • 1-2 hundred gravestones, instanced, colliders on
  • Spawned from an instance in the scene, rather than from disk
  • Scatter cache uses a tres, not a res
  • 11 scatter nodes

SlashScreen avatar Aug 13 '23 22:08 SlashScreen

This could be related to #154 Does the cache folder actually exists under res://addons/proton_scatter/cache and is there a cache file in there?

HungryProton avatar Aug 19 '23 12:08 HungryProton

There are caches both in the addons folder, and a folder I created myself. The one in the addons folder is outdated, and is not relevant.

I added more print statements to the cache rebuilder, and I found something surprising. I modified the code as shown here:

func restore_cache(force_restore := false) -> void:
	print("Restoring cache...")
	var rebuild_start = Time.get_ticks_usec()
	# Load the cache file if it exists
	_local_cache = load(cache_file)
	if not _local_cache:
		printerr("Could not load cache: ", cache_file)
		return

	_scatter_nodes.clear()
	_discover_scatter_nodes(_scene_root)

	for s in _scatter_nodes:
		if s.force_rebuild_on_load and not force_restore:
			continue # Ignore the cache if the scatter node is about to rebuild anyway.

		# Send the cached transforms to the scatter node.
		print("Restoring %s..." % s.name)
		var scatter_start = Time.get_ticks_usec()
		var transforms = ProtonScatterTransformList.new()
		transforms.list = _local_cache.get_transforms(s.name)
		print("Found %s transforms." % transforms.list.size())
		# There is something between these two print statements that causes it to hang for roughly 18 seconds, so much so that 
		# the engine, and the profiler, do not advance at all. 
		s._perform_sanity_check()
		s._on_transforms_ready(transforms)
		print("Restored %s in %s seconds." % [s.name, (Time.get_ticks_usec() - scatter_start) / 1_000_000.0])
	
	print("Restored cache in %s seconds." % ((Time.get_ticks_usec() - rebuild_start) / 1_000_000.0))

Which produced the following output:

Restoring cache...
Restoring Trees...
Found 2485 transforms.
Restored Trees in 3.596441 seconds.
Restoring dead branches...
Found 2962 transforms.
Restored dead branches in 2.357756 seconds.
Restoring Grass...
Found 6000 transforms.
Restored Grass in 2.495619 seconds.
Restoring Gravesite...
Found 53 transforms.
Restored Gravesite in 0.004314 seconds.
Restoring Gravesite7...
Found 53 transforms.
Restored Gravesite7 in 0.00412 seconds.
Restoring Gravesite8...
Found 52 transforms.
Restored Gravesite8 in 0.013086 seconds.
Restoring Gravesite2...
Found 53 transforms.
Restored Gravesite2 in 0.003777 seconds.
Restoring Gravesite3...
Found 53 transforms.
Restored Gravesite3 in 0.003776 seconds.
Restoring Gravesite4...
Found 53 transforms.
Restored Gravesite4 in 0.00371 seconds.
Restoring Gravesite5...
Found 54 transforms.
Restored Gravesite5 in 0.003788 seconds.
Restoring Gravesite6...
Found 52 transforms.
Restored Gravesite6 in 0.003791 seconds.
Restoring Gravesite9...
Found 120 transforms.
Restored Gravesite9 in 0.044845 seconds.
Restored cache in 8.592196 seconds.

However: There is a roughly 18 second delay after

Restoring Gravesite9...
Found 120 transforms.

Before it continues. I commented where that would be in the code above. The printout says it takes 8.6 seconds (still a long time), but in reality it takes around 30, it completely halts the engine. Gravesite9 is nondescript, I am not sure why it takes so long, or even if that is the issue. It looks like this:

image

I'll do some more testing in the potential problem functions. It looks like _on_transforms_ready has an await, so maybe that's the issue?

SlashScreen avatar Aug 20 '23 07:08 SlashScreen

Update: gravesite 9 hangs while doing _update_split_multimeshes(). Investigating further. I wonder if it lies on some sort of chunk boundary?...

SlashScreen avatar Aug 20 '23 07:08 SlashScreen

Another Update: More narrowing down to get_or_create_multimesh_chunk()

SlashScreen avatar Aug 20 '23 07:08 SlashScreen

Thanks, I'll run some tests on my side as well.

You can also completely disable the chunk logic in the ScatterNode, under Performance > Use Chunks and see if it helps.

HungryProton avatar Aug 20 '23 07:08 HungryProton

It hangs on this line. I wonder why this could be?...

(Also This could probably be optimized using Array.Resize() so you don't have to keep resizing.)

SlashScreen avatar Aug 20 '23 07:08 SlashScreen

Also. since _create_instance() calls add_child deferred, perhaps you could put it in a worker thread pool to speed up cloning. I see there is object pooling, so maybe create a list of things that need to be created, and update the ones that already exist. Then, after all that, set the worker pool loose upon the ones that need to be created.

SlashScreen avatar Aug 20 '23 07:08 SlashScreen

It hangs on this line. I wonder why this could be?...

One thing that might be interesting is that this call to 'set_owner' is a bit weird. If you print the owner before and after, it only affects anything if the show output in tree option is set. Otherwise the owner is still null. I assume it is because the owner has no owner in the scene tree? Have you tried commenting out this call?

Norodix avatar Aug 20 '23 10:08 Norodix

I was wrong. It's hanging on this line. I will do some more investigation, possibly tomorrow, since it is 3 in the morning.

SlashScreen avatar Aug 20 '23 10:08 SlashScreen

I noticed that in my case automatic LOD generation was taking a long time on load. @SlashScreen you can try disabling it in ScatterItem/Level Of Detail/Generate and see if it helps

LoipesMas avatar Aug 20 '23 17:08 LoipesMas

I noticed that in my case automatic LOD generation was taking a long time on load. @SlashScreen you can try disabling it in ScatterItem/Level Of Detail/Generate and see if it helps

After turning it off, it now freezes on this line. I feel like there's a deeper issue here than just LOD generation. I hope it's not an engine issue.

I'm wondering if, since get_or_create_multimesh_chunk() always queues free at the end, and it's called a lot, and it always frees at the last of all these loops, I wonder if the 18 seconds is because it's freeing like a thousand resources at once.

SlashScreen avatar Aug 20 '23 22:08 SlashScreen

But sometimes it freezes elsewhere, and rarely it doesn't free at all. What is going on?...

SlashScreen avatar Aug 21 '23 04:08 SlashScreen

I have not found these timing methods to be very reliable in tool scripts or when multiple threads are involved. Adding a wait until next process frame before recording the time helped things in my case, but it might not help you much.

Are you creating lots of instances that use project on colliders modifier? I suspect that can cause hangs because that is the only one (afaik) that runs in the main thread. Maybe try without those.

Norodix avatar Aug 21 '23 05:08 Norodix

That causes time when generating, sure, but the issue is that this is happening when restoring from cache.

SlashScreen avatar Aug 21 '23 06:08 SlashScreen

I think the root cause of the issue is the way instances are added to the scene. Copies seem to work much better but they use a deferred call to add children. That works fine for up to thousands of instances. It only has a momentary pause when it actually adds the children.

Changing the chunks to use the same deferred call seem to replicate this. I don't know if it has any unintended side effects. So far it seems okay to me.

Norodix avatar Aug 21 '23 19:08 Norodix

Moving the get_merged_meshes_from(item) out from the repeated call to generate the mmi chunk also seems to help significantly. I save it to a variable before the chunk loop and it feels much faster.

The get_or_create_multimesh_chunk function is mimicing the get_or_create_multimesh but maybe even more things could be rearranged to reduce repeated calculations.

Still, the biggest thing seems to be the deferred call to add children. I don't 100% understand how that works and whether it can cause problems. Eg. is it okay to modify properties after calling add_child in a deferred manner and before it is actually added? It seems to work, but there might be some traps.

Norodix avatar Aug 21 '23 20:08 Norodix

Still, the biggest thing seems to be the deferred call to add children. I don't 100% understand how that works and whether it can cause problems. Eg. is it okay to modify properties after calling add_child in a deferred manner and before it is actually added? It seems to work, but there might be some traps.

From my understanding, add_child only changes the parent of the node, so whether other properties are changed before or after that shouldn't matter (assuming that those properties don't rely on the parent node). And deferring the call just moves it to "idle time". Although I'm not sure why deferring the call would give a performance improvement. Maybe it gets batched somehow?

LoipesMas avatar Aug 21 '23 20:08 LoipesMas

call_deferred() waits to call a function until the next idle time- such as the await get_tree().process_frame in scatter's _on_tranforms_ready(). add child only adds a node to the scene tree, so while you're waiting for it to be called, the node object you created still exists, and you can do with it as you please, so long as you don't call anything that needs to query/affect the scene tree relative to the node (in which case, that would also need to be deferred), such as setting owners or getting parents.

While deferring the add_node causing a performance boost makes intuitive sense in my head, I cannot explain it with actual code.

@Norodix Would you mind sharing/committing what you've changed so I can paste it in on my end? And yes, I suspect a lot of things could be changed to avoid repeated calculations, and to better support multithreading (For example, perhaps instead of restoring the cache object by object, perhaps a worker pool could be dispatched to do it all at once, something I may experiment with.)

SlashScreen avatar Aug 21 '23 20:08 SlashScreen

You can also completely disable the chunk logic in the ScatterNode, under Performance > Use Chunks and see if it helps.

I was also noticing massive editor lag when chunks were enabled; however disabling chunks and the editor is now immediately responsive when changing any scatter configuration values in the inspector in a large 1024mx1024m terrain chunk. However, by making this change, the load times for the scene have gone up tremendously, despite using a ScatterCache.

Naros avatar Aug 22 '23 00:08 Naros

Disabling the chunking behavior had no effect on the hang.

SlashScreen avatar Aug 22 '23 00:08 SlashScreen

I don't want to commit it until it is confirmed that it works well. Here are the 2 functions that I changed:

https://pastebin.com/8HAw7T6X

Norodix avatar Aug 22 '23 05:08 Norodix

that didn't fix it. I'm trying to get a profiler to see what is going on.

SlashScreen avatar Aug 22 '23 06:08 SlashScreen

Weird, it makes a huge difference for me. But on my end disabling chunking also fixed the loading time, so your issue might be different.

I suggest you try to figure out the problem with chunking disabled, so there are fewer variables to consider. You might have already stumbled on a solution but because chunking was slow it masked the improvement.

Norodix avatar Aug 22 '23 07:08 Norodix