godot_heightmap_plugin icon indicating copy to clipboard operation
godot_heightmap_plugin copied to clipboard

Loading Terrain from a thread causes rendering and collision issues.

Open evie-calico opened this issue 4 years ago • 25 comments
trafficstars

I've consistently found that loading a scene containing terrain from a thread will cause strange issues. The LOD system seems to be inaccurate, chunks close to the camera don't show up, and collision only works from the very edges.

Steps to reproduce the behavior:

  1. Create a thread
  2. Load the scene using the thread
  3. Call wait_to_finish(), instancing the return value into the scene tree.
  4. Collision and rendering will be buggy.

evie-calico avatar Jan 14 '21 01:01 evie-calico

I would tend to think that instancing the terrain from a thread is not supported. One main reason being, the physics engines Godot uses are not thread-safe at all.

Although, in the way you say you load it (i.e instancing on the main thread), it might be another issue, so second thing to verify is, if the renderer threading model is appropriate. For example, SingleUnsafe would cause problems.

This plugin does not use any global variable at all. The loading functions solely care about the thing they load, so threading shouldn't be a particular issue.

Maybe one edge case could be to test if the loading is also broken if you comment out all OSes in that dictionary: https://github.com/Zylann/godot_heightmap_plugin/blob/362d78c265840b1b026040a9f9087c5366f33b2f/addons/zylann.hterrain/native/factory.gd#L7 If that fixes it, then it could be a GDNative issue. The C++ code in there is also thread-safe in itself, but maybe the bindings it uses or the GDNative module is buggy in regard to threads.

It could also be a problem in your code. I can't tell more without a test project showing the issue.

Zylann avatar Jan 14 '21 18:01 Zylann

I'm not doing any fancy loading, so I doubt it has anything to do with my code. After switching from multithreaded to single safe, I had the same issue. It looks like GDNative is causing the issue, commenting out Windows has solved the problem.

Single Safe, No collision: (Edit: this clip also shows the weird collision near the edges. When I swim out from the island, the edge pushes me underwater)

https://user-images.githubusercontent.com/14899090/104651905-5533da00-5686-11eb-81cc-5b42d097a2b3.mp4

No Native, Fixed

https://user-images.githubusercontent.com/14899090/104651916-58c76100-5686-11eb-83ad-c1cf1da7ff00.mp4

evie-calico avatar Jan 14 '21 21:01 evie-calico

Hmmm so that means there is a bug in Godot's GDNative implementation which makes things very not thread-safe when a C++ script is instanced from a thread the first time... finding where it does things wrongly is a chore I haven't enough motivation to look into at the moment :( thread-corruption reproduce often randomly. This is the C++ function that runs during loading: https://github.com/Zylann/godot_heightmap_plugin/blob/7ad5546bae2a50cb54a42d96ac4f77fd72425e9c/addons/zylann.hterrain/native/src/image_utils.cpp#L50

There might be a possible workaround to keep the C++ script working, but without knowing the cause, I'm not 100% sure if it will work: The theory is that somewhere in GDNative loading code, something thread-unsafe occurs. It could be loading, it could be instancing of the class for the first time. So the idea is to force this to happen at least once in the main thread, before you spawn your thread. Put back the dictionary entries in factory.gd. In your game code, before spawning your thread, add this member variable in one of your scripts, such that it would be created before your thread, and remains alive as long as your game runs:

var _image_utils = preload("res://addons/zylann.hterrain/native/factory.gd").get_image_utils()

This will force Godot to load the library and instance the C++ script inside at least once. If the bug still happens, then it may need more investigation I think.

There is a slim chance that it's not thread-corruption at all. Instead maybe the lib just failed loading completely for some reason (in which case there should be an error printed?), but then the relation to collisions is even weirder because the collider only gets created on instancing, not on load.

Zylann avatar Jan 14 '21 21:01 Zylann

Looks like that was it. I'll let you know if I have any more issues, thanks a lot. I can try some more things if you need more to go off of.

https://user-images.githubusercontent.com/14899090/104654930-ea38d200-568a-11eb-8a2e-821eb2c5476d.mp4

evie-calico avatar Jan 14 '21 22:01 evie-calico

I'll add the Need engine fix tag for now but it could also be the C++ bindings registration code.

Zylann avatar Jan 14 '21 22:01 Zylann

Its not a bug. Got the same issue. Thats because u spawning your player before the terrain collider was created by the plugin.

This is how u can fix it.

  1. Create a script an attach it to the scene 2.) Create a signal inside the script, call the signal from the "ready" function 3.) Connect your player spawn function with the new signal 4.) Loading the scene threaded, be sure you add the scene with call_deferred 5.) Tada is working

I think in the end, its just a timing problem.

Thats how i do it:

`    public void LoadMap()
    {
        //just ignore thats my own class for loading threaded scenes
        var loadingElement = new BackgroundLoaderItem("res://maps/TestMap.tscn");
        loadingElement.Connect("CompleteLoadEvent", this, "InitMap");
        backgroundLoader.Load(loadingElement);
    }
    
    //okai we got the resource
    private void InitMap(Resource res)
    {
        GD.Print("[Server] Map loaded");

        var scene = (PackedScene)res;
        map = (BaseMap)scene.Instance();
        map.Name = "map";
        
        //Thats the signal of the ready function inside the map
        map.Connect("MapLoadedComplete", this, "MapLoaded");
 
        mapHolder.AddChild(map);
    }
    
    private void MapLoaded()
    {
         //spawn your player
    }
    `

kalysti avatar Jan 27 '21 13:01 kalysti

@sboronczyk how do you explain this? https://github.com/Zylann/godot_heightmap_plugin/issues/236#issuecomment-760491229

There might be a problem in the game code as well regarding collision but I wonder why switching from the native to the GDScript implementation had an effect on this. GDScript is even slower so I feel like the collider would have been late compared to native.

Zylann avatar Jan 27 '21 13:01 Zylann

@sboronczyk how do you explain this? #236 (comment)

Thats just a threading problem. He creating and instancing the scene in another thread. Think about that your HTerrain.GD Script is now also running outside the main thread, and cant access some necessary values. Thats all.

The "background loading technic" is only for loading resources not more. You cant run scenes "well" inside a thread. Also they are some differences between C# Threads and Godot Threads.

I writing an open world game with a variety of threads, so blv me threading in godot is not so easy :-)

kalysti avatar Jan 27 '21 14:01 kalysti

I still dont understand how all this happens. I know timing problems cause this but here I'm trying to understand exactly what the sequence of events was in the two repros. The first fix before your post was litterally sticking a preload of the GDNative class before the loading. And this reliably fixed the problem. So the only possible explanation to me, would be that loading that small native library lazily during the loading of the terrain takes a lot of time (relatively), more time than if GDScript was used to compute the same thing slower. And that time difference is what made the game spawn the character before the terrain finished loading?

Zylann avatar Jan 27 '21 14:01 Zylann

"So the only possible explanation to me, would be that loading the native library lazily during the loading takes a lot of time (relatively), and that time is what made the game spawn the character before the terrain finished loading?". Thats exactly what i write before. They player node haven't the current state.

kalysti avatar Jan 27 '21 14:01 kalysti

There is also a "bug" "i remember it", when you block the main thread for some reasons (loading or adding big scenes) at the same time been your terrain is loading. Then textures or certain shapes are displayed incorrectly or not at all (randomly). But its in the end the same issue. Timing problem

kalysti avatar Jan 27 '21 14:01 kalysti

I'm not sure I understand how you're saying this happens. When this bug occurs, the collision only works in very specific places near the edge. It's not like I'm being dropped under the terrain before the collision is made, because I can fly back through it repeatedly, and spawn more objects which will fall through. I don't see how the entire terrain's collision is reliant on a few nodes.

evie-calico avatar Jan 27 '21 14:01 evie-calico

That GDNative library must take a lot of time to load from that thread, then. That could be suspicious. This is what I'm getting at. (of course your idea of signal to spawn the character is good and should be fixed if it's not doing that).

When I refer to "GDNative library", this is not the terrain script at all. It's only a small fraction of it, a utility class, which happens to be used by HTerrainData. This utility script has two versions: a GDScript one, and a GDNative one. Either of the two will be loaded. Everything otherwise is GDScript. Switching them has been shown to make a difference. And this issue has been only about the HTerrainData resource (which is by extension loaded by the call to load of the scene). Instancing happens in the main thread.

The connected problem I'm getting at, is that using GDNative (supposed to be fast) would take longer to load a terrain, which defeats the goal of GDNative^^" So perhaps loading the library up front is still a good idea, in addition to the signal fix for character spawning.

Zylann avatar Jan 27 '21 14:01 Zylann

1.) Collision will be genearted by the plugin 1.1.) At the same time you instanced the player and added as child 2.) Player has now "current state" of the collision. But the collision generation is not totaly finish.

This is happend in two cases: 1.) You blocking the main thread for some reasons (and "slowing" down the terrain plugin) 2.) You have a very large detailed terrain but a very small player scene. Same think just in another way

kalysti avatar Jan 27 '21 14:01 kalysti

Ok, I'll put aside that weird issue that GDNative scripts take longer to load than a GDScript.

I understood that the problem would be that the character is spawned before the terrain finished its setup. That totally makes sense and should be checked.

But...

When this bug occurs, the collision only works in very specific places near the edge. It's not like I'm being dropped under the terrain before the collision is made, because I can fly back through it repeatedly, and spawn more objects which will fall through.

This adds to my doubts now.

Here is some details: When a PackedScene is loaded from a thread, it will load resources inside it. It will not create nodes yet, and there should not be anything blocking the main thread when doing that (but there could be, bugs happen). HTerrainData is one of those resources. The collision shape is NOT created by HTerrainData. That resource is just a bunch of values and textures. It does not contain the collider. The collider (it's a single one) is created when the resource is assigned to the HTerrain node, by setting the internal _terrain_data property, here. Properties are set in PackedScene.instance(), which AFAIK, is called on the main thread, after the PackedScene finished to load (correct me if I'm wrong. If it does run in the thread too, this should definitely be fixed). The collision shape is made this way outside the loading because physics are not thread-safe. Btw, if the loaded scene contains CollisionShapes, it might cause trouble for that same reason.

If the character spawns too early, it will fall through. But after that, if the collider is working, spawning further objects on top (not below) should collide correctly.

If the resulting collider is corrupted (rather than just being absent), it likely means the heap got corrupted, by code which didnt properly handle being run from inside a thread. GDNative is one of those rarely-tested areas of Godot where issues like that are could happen, hence why I proposed the experiments earlier. I'm not excluding the possibility of a bug in there, given the evidence. But it could be something else of course. The fact that collisions still don't work even long after spawning, and only in some areas of the terrain, is something yet unexplained.

Zylann avatar Jan 27 '21 14:01 Zylann

I think you should also look at when and how the player is instantiated and assigned. A small demo project would be helpful. Then I can take a closer look at it. Remote diagnoses and assumptions rarely lead to anything :-) But thanks zylann for your quick activity on this case.

kalysti avatar Jan 27 '21 14:01 kalysti

First of all, thanks for the awesome plugin.

I'm doing some testing (linux) with a simple FPS player and a terrain (at the moment it's 4096 in size but I'm not sure how relevant it is). I was following Kasper Frandsen's tutorial and I used the same textures (1024), I haven't added detail levels with grass yet, I just have the terrain, the Player, and a simple CSG in the scene. Since it took a few seconds to load the scene I thought it would be a good idea to try ResourceInteractiveLoader in a thread to possibly give some feedback, and from that moment the problem started to manifest itself, no collision with the ground, except in some spots.

I tried to instantiate the Player scene dynamically in a Map01: _ready -> call_deferred but it doesn't seem to resolve.

Instead, with the MainMenu: preload of image utils before using the thread seems to solve.

Below is the code I am using to load the map.

MainMenu.gd loader:
var resource = null
var loader: ResourceInteractiveLoader
var thread

#var _image_utils = preload("res://addons/zylann.hterrain/native/factory.gd").get_image_utils()

func _process(_delta):
	if Input.is_action_just_pressed("ui_cancel"):
		get_tree().quit()

#func _on_Start_pressed():
#	var err = get_tree().change_scene("res://maps/Map01.tscn")
#	if err != 0:
#		print("error: change_scene failed (Map01)")

func _on_Start_pressed():
	thread = Thread.new()
	thread.start(self, "loading", "res://maps/Map01.tscn")

func loading(resource_path):
	loader = ResourceLoader.load_interactive(resource_path)
	update_progress()
	var state = loader.poll()	
	while true:
		if state == ERR_FILE_EOF:
			print("loaded")
			update_progress()
			resource = loader.get_resource()
			break
		elif state == OK:
			update_progress()
			state = loader.poll()
		else:
			print("ERROR: loader: ", state)
			resource = null
			break
	call_deferred("load_completed")

func load_completed():
	thread.wait_to_finish()
	if (resource != null):
		get_tree().change_scene_to(resource)

func update_progress():
	var percent = (loader.get_stage() + 1) * 100 / loader.get_stage_count()
	print(percent)

xenogenesi avatar Feb 03 '21 12:02 xenogenesi

At this point I would try to make a reproduction project only involving a GDNative library. It seems like the problem is not related to the terrain plugin.

Zylann avatar Feb 03 '21 12:02 Zylann

Experiencing the same issue as @xenogenesi with the ResourceInteractiveLoader. Will pack together reproduction project later.Its interesting that on macOS it works fine, the issue is only occurring on Windows and Linux.

chaeMil avatar Sep 08 '22 13:09 chaeMil

@chaeMil: I'm thinking Mac uses Metal and Win/Linux uses OpenGL as such... that might be the difference

Zireael07 avatar Sep 08 '22 13:09 Zireael07

@chaeMil: I'm thinking Mac uses Metal and Win/Linux uses OpenGL as such... that might be the difference

Im Still on 3.5 so it uses OpenGL as well. But the main problem is that the collisions are not working on Linux and Windows.

chaeMil avatar Sep 08 '22 13:09 chaeMil

A theory I had in earlier messages was that the loading of GDNative classes works badly in a multithreaded scenario, unless the classes were loaded up-front. Since the plugin has GDNative implementations only on Windows and Linux, that would explain why it works on Mac.

Zylann avatar Sep 08 '22 17:09 Zylann

@Zylann is it possible to use the non-native plugin on other platforms than mac?

chaeMil avatar Sep 08 '22 18:09 chaeMil

Ok managed to do it by commenting out supported_os entries and removing the native binaries from hterrain.gdnlib

chaeMil avatar Sep 08 '22 18:09 chaeMil

You can also preload the GDNative classes on supported platforms prior to triggering the threaded loading. It's a problem in how Godot loads them, which seems to disappear if you make sure the classes don't get lazily-loaded in another thread.

Zylann avatar Sep 08 '22 19:09 Zylann