godot
godot copied to clipboard
Fix some race conditions that happen during initialization
Needed for #73777
This should be the last one involving initialization. Should I open a new PR or add it to this one?
diff --git a/core/os/thread.cpp b/core/os/thread.cpp
index 92865576f3..d533b1c1e2 100644
--- a/core/os/thread.cpp
+++ b/core/os/thread.cpp
@@ -58,13 +58,19 @@ void Thread::callback(ID p_caller_id, const Settings &p_settings, Callback p_cal
if (platform_functions.init) {
platform_functions.init();
}
- ScriptServer::thread_enter(); // Scripts may need to attach a stack.
+
+ // Currently causes a race condition due to the fact that threads are
+ // spawned before ScriptServer is even initialized.
+ // This is currently unused since neither GDScript nor C# require
+ // attaching a stack to new threads, so we just disable for now.
+
+ //ScriptServer::thread_enter();
if (platform_functions.wrapper) {
platform_functions.wrapper(p_callback, p_userdata);
} else {
p_callback(p_userdata);
}
- ScriptServer::thread_exit();
+ //ScriptServer::thread_exit();
if (platform_functions.term) {
platform_functions.term();
}
All of the possible workarounds for this one get ugly really fast since ScriptServer isn't always initialized, and any changes to main() that can fix this should probably be centralized in #49362 given the high likelihood of breaking something.
In the meantime, disabling thread_enter/exit() means that ThreadSanitizer can go into CI sooner, which would also aid debugging in case a new language that does need it is ever added.
If no scripting language is using thread enter/exit callbacks (and there are no near future plans for it), I'd even remove them.
However, I'm a bit unsure about the needs that language extensions developed separately may have (JS, Nim, Rust, whatever).
I am wondering if maybe we could somehow apply a compromise solution that patches the current situation to keep it functional and non-racey, even if somewhat hacky. I'm considering something like this:
- Add a
Mutex/SpinLocktoScriptServerso that_language_countand_languagesare treated atomically. - On
thread_enter(), take a snapshot of the language array and count in TLS. - On
thread_exit()notify only the languages which are in both the current languages array and the TLS copy. - Additional thoughts:
- Comparing current vs. snapshot languages by pointer would not be reliable; a possibility to workaround that would be that unregistered languages just leave a null entry in the array so they can be compared by index, or that the
ScriptLanguageclasses get some sort of serial number that lets differentiate among instances. - Maybe
Vector(), with its COW semantics, can somehow make the new code easier?
- Comparing current vs. snapshot languages by pointer would not be reliable; a possibility to workaround that would be that unregistered languages just leave a null entry in the array so they can be compared by index, or that the
- Comparing current vs. snapshot languages by pointer would not be reliable; a possibility to workaround that would be that unregistered languages just leave a null entry in the array so they can be compared by index, or that the
ScriptLanguageclasses get some sort of serial number that lets differentiate among instances.- Maybe
Vector(), with its COW semantics, can somehow make the new code easier?
Changing _languages into an array of Ref<ScriptLanguage>s and making a TLS copy in thread_enter() would probably be the cleanest way to follow through with this approach.
Unfortunately, it wouldn't help the main victim of this race condition, WorkerThreadPool. Since WorkerThreadPool threads are initialized early and live for the entire duration of the program, unlucky initializations could result in all uses of WorkerThreadPool not having a stack.
I don't know if there's any decision that can be taken here that doesn't leave something broken, including just leaving it as-is :shrug:
The ScriptServer workaround has been split off into #74501
Thanks!
Cherry-picked for 4.0.2.