godot-cpp icon indicating copy to clipboard operation
godot-cpp copied to clipboard

Editor crashes on shutdown if certain singletons were used in extension

Open mihe opened this issue 1 year ago • 8 comments

When using certain singletons like OS::get_singleton() from within a GDExtension it seems like internal::gdn_interface->object_get_instance_binding in the generated get_singleton() ends up storing callbacks on the editor side to OS::___binding_callbacks which point to functions on the extension side.

These callbacks (___binding_free_callback specifically) end up getting invoked on editor shutdown (at Object::~Object()) during memdelete(_os) in unregister_core_types(). The problem is that by then we've already unloaded the extension libraries in memdelete(native_extension_manager), resulting in trying to call a function that no longer exists, leading to an access violation crash.

This also applies to Engine::get_singleton(). More singletons might be affected that I haven't run into yet.

Editor: v4.0.beta.custom_build (ca25c6e0a) Architecture: x86_64 Compiler: MSVC (v19.33.31630) OS: Windows 11 (v10.0.22621)

Minimal repro:

#include <godot/gdnative_interface.h>

#include <godot_cpp/classes/os.hpp>
#include <godot_cpp/core/class_db.hpp>
#include <godot_cpp/core/defs.hpp>
#include <godot_cpp/godot.hpp>

using namespace godot;

void initialize_example_module(ModuleInitializationLevel p_level) {
    if (p_level == MODULE_INITIALIZATION_LEVEL_SCENE) {
        OS::get_singleton();
    }
}

extern "C" {

GDNativeBool GDN_EXPORT example_library_init(
    const GDNativeInterface* p_native_iface,
    const GDNativeExtensionClassLibraryPtr p_native_lib,
    GDNativeInitialization* p_native_init
) {
    GDExtensionBinding::InitObject init_obj(p_native_iface, p_native_lib, p_native_init);
    init_obj.register_initializer(initialize_example_module);
    init_obj.set_minimum_library_initialization_level(MODULE_INITIALIZATION_LEVEL_SCENE);
    return init_obj.init();
}

} // extern "C"

mihe avatar Oct 10 '22 08:10 mihe

I can reproduce this on current master.

saki7 avatar Apr 06 '23 03:04 saki7

@akien-mga cc: @vnen @dsnopek Could you triage this bug for 4.1? I have confirmed that this bug would make application crash every time if I use singleton instance in godot-cpp.

There is one unofficial patch which was published by 3rd party project: https://github.com/effekseer/EffekseerForGodot4/commit/a7f4fa438784cdf9e5118a6713b74d8be9e65e07

I have applied above patch to my local fork, and I have confirmed that the crash is gone. I think we can use it as a reference. However, I'm not 100% sure if the SingletonBinder approach is the correct way to solving this issue. I have asked the author to propose the patch here, but the author has not responded.

saki7 avatar Jun 07 '23 02:06 saki7

That SingletonBinder in EffekseerForGodot4 assumes that all referenced singleton bindings call ___binding_create_callback at some point, which is not the case, and will double-free if the extension does something like expose a custom physics server, where the "binding" is effectively created by the extension itself.

I've been running a somewhat similar hack/fix for the past couple of months, which does account for this, and seems to work as far as I know. I'm not proposing that this be merged upstream though, as I'm sure there are more esoteric interactions where this too will break.

Someone with a much deeper understanding of the godot-cpp bindings likely needs to take a closer look at the whole binding setup in godot-cpp.

mihe avatar Jun 07 '23 07:06 mihe

@mihe Your patch seems straightforward to me. Could you elaborate on the circumstances where your patch wouldn't work?

I'm eager to find a proper fix for this since I'm having a hard time convincing my teammates that Godot 4.x is production ready whereas it's crashing every time on program exit.

saki7 avatar Jun 07 '23 22:06 saki7

I'm not saying that it wouldn't work, I'm saying that I can't confidently state that it will work in the general case.

Again, someone with a deeper understanding of the bindings in godot-cpp would need to lead the charge here. I am not that guy. :)

EDIT: Also, my hack/fix prioritizes keeping the changes contained in one place, hence the liberal use of lambdas and whatnot. If this were to be merged upstream it should probably be heavily refactored and not circumvent the regular binding callbacks.

mihe avatar Jun 08 '23 07:06 mihe

I haven't done any experimenting with this, but based on the the description and the linked patch, it sounds like the problem would be solved if we deleted all the singleton wrappers in GDExtensionBinding::deinitialize_level() (which is called when unloading the extension)?

Assuming that's correct, I think we could probably solve this with a simpler patch. Object::~Object() is virtual (meaning deleting an Object * should call the child class' destructor too), and so I think we could have the get_singleton() functions store the Object * in a list when the wrapper is first created, and then GDExtensionBinding::deinitialize_level() could loop through and delete them all. But I don't know if that'd really work until I try it!

I'll add experimenting with this to my TODO list :-)

dsnopek avatar Jun 08 '23 12:06 dsnopek

it sounds like the problem would be solved if we deleted all the singleton wrappers in GDExtensionBinding::deinitialize_level()

I'm mostly recalling this from the murky depths of ancient memory, so don't take my word for it, but...

Core doesn't have any way of releasing/unsetting/clearing bindings, so no matter what you delete on the godot-cpp side of things you will still end up with stale function pointers getting invoked by core at shutdown. To make matters worse, some singletons outlive the extension libraries and some don't.

This is what my original PR was meant to address, by adding a clear_instance_binding to core and (somewhat crudely) use that from godot-cpp. It was however quickly co-opted about talks of hot-reloading and more general concerns of binding lifetimes, and subsequently sat unreviewed for 5+ months.

The hack/fix I'm currently using, as well as the one linked from EffekseerForGodot4, skirts around this by exploiting the null check for free_callback in Object::~Object and does the deallocation of the bindings "manually" at shutdown by exploiting static class destructors.

mihe avatar Jun 08 '23 13:06 mihe

Ah, ok, thanks!

Reading through the comments on your linked PR, Zylann brings up an important point that this issue probably affects all objects, not just singletons.

Hm, it's starting to sound like this is a bigger, trickier problem to solve than it seemed at first! When I have the time, I'll dig into it deeper.

dsnopek avatar Jun 08 '23 13:06 dsnopek

I can confirm that this is still a problem with engine singletons on exit, in my case the Time one. I'm not sure if OP suggests this only affects the editor, but I actually experience the crash when running the project. But the basic premise is the same, if you ever access the Time singleton, then on exit, as the engine tries to run the destructor for it, an attempt to call free_callback causes a crash due to invalid memory access.

image

I'm on https://github.com/godotengine/godot-cpp/commit/cc1217a43cab5ff8377c2f5e53301fda67def95e in godot-cpp and https://github.com/godotengine/godot/commit/aef11a14274f6f9e74ad91ead1d7c07ea1dd7f5f in godot.

YuriSizov avatar Apr 01 '24 11:04 YuriSizov

I'm not sure if OP suggests this only affects the editor, but I actually experience the crash when running the project.

I can't remember if I'd only seen it in the context of the editor at that time, but yes, it most definitely happens when shutting down the game/application as well.

I know at least one other published GDExtension (Debug Draw 3D) that suffers from this issue, causing a crash on shutdown every single time for anyone who has it loaded. I'm sure there are other ones out there as well.

I'm surprised this issue hasn't gotten more traction than it has. I guess people just don't run their editor/game through a debugger or something?

mihe avatar Apr 01 '24 11:04 mihe

I'm surprised this issue hasn't gotten more traction than it has. I guess people just don't run their editor/game through a debugger or something?

Well, unfortunately, I don't think we know how to solve it yet. I don't remember the details, but the last time we discussed this, it was shown that this issue affects more than just singletons, but singletons are the the easiest way to trigger it.

If anyone has a proposed solution, that'd be great! Otherwise, I'll personally look at it eventually - there's just so many things :-)

dsnopek avatar Apr 01 '24 16:04 dsnopek

Yes, sorry, I meant that I'm surprised this issue isn't flooded with people facing the same problem, not that I'm surprised that it hasn't been fixed yet. It's a tricky one.

I just feel like every non-trivial extension should run into this crash sooner or later.

mihe avatar Apr 01 '24 17:04 mihe

If anyone has a proposed solution, that'd be great!

Before looking into this my naive idea was to keep track/count references for every singleton (or, more broadly, every Godot object), and make sure to clear/null all the pointers when your extension doesn't need it anymore. This idea seems to be very similar to the kind of bookkeeping you've implemented for hot reloading extensions. Is there any reason why we cannot extend this approach to address the issue at hand?

I am probably missing something, but I'd appreciate it if this was resolved sooner rather than later. While crashes on exit will likely remain unnoticed by end users, it affects development process greatly, as you basically cannot exit your application or the editor normally, if you have a debugger attached.

YuriSizov avatar Apr 01 '24 17:04 YuriSizov

I'd appreciate it if this was resolved sooner rather than later. While crashes on exit will likely remain unnoticed by end users, it affects development process greatly, as you basically cannot exit your application or the editor normally, if you have a debugger attached.

If you just need a short-term fix for your own use, and you don't mind forking/patching godot-cpp, my workaround has been working well in Godot Jolt for a good long while now.

I haven't merged with upstream since 4.2-stable was released though, so it could be that it's due for another touch-up, but so far they've all been quick and painless.

mihe avatar Apr 01 '24 18:04 mihe

This idea seems to be very similar to the kind of bookkeeping you've implemented for hot reloading extensions. Is there any reason why we cannot extend this approach to address the issue at hand?

Well, when working on hot reloading, the feeling was that this bookkeeping added too much of a performance and memory hit to do in release builds. Hot reload is a developer tool, and so the goal was to only add this extra overhead during development.

It'd be great if we could either:

  1. Change the order that things are shutdown, such that we don't encounter the problem. There are other existing issues with the order that GDExtensions are initialized and shutdown relative to other things in Godot (ex GDScript trying to access GDExtension classes before they exist), so maybe a comprehensive re-think of that could fix it? Or,
  2. Somehow check if the GDExtension associated with free_callback is still loaded before calling it. This way we don't need to clear it on every object. We already associate the callbacks with a void *p_token which is the GDExtension * in the case of GDExtensions, which we could check for in GDExtensionManager. The only problem is that C# also uses these callbacks, and, in that case, the void *p_token is something else entirely. This won't be a problem once C# uses GDExtension, but for now, we'd probably need to track some new piece of data with the binding callbacks? Or, maybe, we can maintain a list of void *p_tokens that are no longer valid that we update when a GDExtension is unloaded, and then we don't run the free_callback if it's on the list?
  3. Some other clever idea we haven't thought of yet :-)

dsnopek avatar Apr 02 '24 16:04 dsnopek

Thanks for the write-up! I don't think number 1 is feasible with how the engine is designed. Both startup and shutdown are messy order-wise, and with extensions being a mechanism to put your fingers into almost every aspect of the engine, I don't know if there is a way to describe either routine in a non-conflicting way. Anything short of a manifest for each extension, which particular system it uses, won't let us make any assumption.

So I think that number 2 is more interesting and promising. To work around the C# problem, can we add a flag to the bindings struct that would indicate the user of the struct, be it an extension or the C# module? All we'd need is one bit, and then you could apply your proposed solution conditionally. Although C# may also have such issues. I remember fixing some bugs related to the order in which things happen where C#-scripted classes might've been accessed by a core system before scripting was ready. Sounds related to the overarching issue here.

YuriSizov avatar Apr 02 '24 18:04 YuriSizov

I ran into this too. In case it's useful, here's another "MRP" that adds the crash to this repo's test project, for ease of testing with pre-existing SCons scaffolding.

diff --git a/test/src/example.cpp b/test/src/example.cpp
index 3ec8bca..89e5d12 100644
--- a/test/src/example.cpp
+++ b/test/src/example.cpp
@@ -11,6 +11,7 @@
 #include <godot_cpp/classes/label.hpp>
 #include <godot_cpp/classes/multiplayer_api.hpp>
 #include <godot_cpp/classes/multiplayer_peer.hpp>
+#include <godot_cpp/classes/time.hpp>
 #include <godot_cpp/variant/utility_functions.hpp>
 
 using namespace godot;
@@ -297,6 +298,8 @@ Example::~Example() {
 
 // Methods.
 void Example::simple_func() {
+	int time = Time::get_singleton()->get_ticks_msec();
+	printf("time: %d\n", time);
 	emit_custom_signal("simple_func", 3);
 }

I only skimmed through the discussion, but if we're not sure about how to handle this properly for all potential problematic cases, but most users seem to just be affected by using singletons, I think a temporary fix like the one @mihe is using for godot-jolt might be worth considering, until we have a better idea.

akien-mga avatar May 07 '24 10:05 akien-mga

I've just posted PR https://github.com/godotengine/godot-cpp/pull/1458, which should fix this issue specifically for singletons. It doesn't address the general case where a GDExtension is unloaded before an Object can clean up its instance binding to that GDExtension.

dsnopek avatar May 07 '24 19:05 dsnopek