godot icon indicating copy to clipboard operation
godot copied to clipboard

Fix physics/3d/run_on_separate_thread race condition in WorkerThreadPool (crash).

Open haasanen opened this issue 2 years ago • 5 comments

Fixes #67333, fixes #63671, fixes #63879 May also fix this: #61250

This mutex request fixes a race condition in WorkerThreadPool as far as I can tell. The race condition causes an eventual segmentation fault (crash). The comment on the groups.erase(p_group); is only true when physics/3d/run_on_separate_thread option in the project settings is disabled.

For example, add the following line next to groups.erase(p_group);:

OS::get_singleton()->print("Thread id: %u\n", Thread::get_caller_id()); // Debug line
groups.erase(p_group); // Threads do not access this, so safe to erase here.

If you have physics/3d/run_on_separate_thread disabled, it prints only a single thread id:

...
Thread id: 3401590889
Thread id: 3401590889
Thread id: 3401590889
Thread id: 3401590889
Thread id: 3401590889
Thread id: 3401590889
...

If you enable physics/3d/run_on_separate_thread, it prints two thread ids:

...
Thread id: 3840933713
Thread id: 3840933713
Thread id: 1959143244
Thread id: 3840933713
Thread id: 3840933713
Thread id: 1959143244
Thread id: 3840933713
Thread id: 3840933713
...

The same is true if you enable the 2D thread in physics/2d/run_on_separate_thread. If you have separate threads enabled for both 2D and 3D, there are still only two thread ids printed.

I understand that this might not be the ideal place to fix the race condition, as the comment on the line suggests that there should not be multiple threads accessing it. However, this does fix the UI crashing in 4.0 beta 3 (or since #63354).

To test this:

  1. Create empty project and enable physics/3d/run_on_separate_thread in the settings.
  2. Save an empty scene with Control node as root.
  3. Run project.
  4. To speed up crashing (could be a placebo): move your mouse on and off the window. This will create internal events. Within a minute or so, your program crashes without this PR fix.

Or you can run any existing project with the physics/3d/run_on_separate_thread enabled and it will crash within minutes without this PR fix.

I'd really appreciate if someone else also verified this as it works on my machine™ .

haasanen avatar Oct 20 '22 17:10 haasanen

To me the name run_on_separate_thread suggests that only one non-main thread should be starting those worker thread pools. Using your debug code, can you check if one of the threads is the main thread? If so, we should probably prevent the main thread from starting the worker thread pools in the first place. (I'm no expert on threads, so take my comment with a grain of salt.)

rburing avatar Oct 21 '22 09:10 rburing

To me the name run_on_separate_thread suggests that only one non-main thread should be starting those worker thread pools. Using your debug code, can you check if one of the threads is the main thread? If so, we should probably prevent the main thread from starting the worker thread pools in the first place. (I'm no expert on threads, so take my comment with a grain of salt.)

I modified the debug print statement to:

OS::get_singleton()->print("Thread id: %u, Main id: %u, is main: %s\n",
		Thread::get_caller_id(),
		Thread::get_main_id(),
		Thread::get_main_id() == Thread::get_caller_id() ? "yes" : "no"); // Debug line

With run_on_separate_thread disabled for both 3D and 2D:

...
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
...

With run_on_separate_thread enabled only for 3D:

...
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
...

With run_on_separate_thread enabled only for 2D:

...
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 2197952072, Main id: 2197952072, is main: yes
...

With run_on_separate_thread enabled for both 3D and 2D:

...
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
...

haasanen avatar Oct 21 '22 10:10 haasanen

I've tested and can confirm this does fix #67333 for me.

jtnicholl avatar Oct 25 '22 19:10 jtnicholl

I've tested and can confirm this does fix #67333 for me.

Thanks for testing and confirming!

haasanen avatar Oct 25 '22 20:10 haasanen

This should be fine, performance should not really be an issue in this context.

reduz avatar Nov 28 '22 17:11 reduz

Thanks!

akien-mga avatar Nov 29 '22 12:11 akien-mga