godot icon indicating copy to clipboard operation
godot copied to clipboard

ERROR: Condition "p_elem->_root != this" is true.

Open thoraxe opened this issue 2 years ago • 17 comments

Godot version

v3.4.4.stable.mono.official.419e713a2

System information

Linux (Fedora), Intel Graphics

Issue description

I am getting this error, which appears to be a recurring problem now and again. It is not clear when it starts occurring or what causes it explicitly.

You can see my entire game client here: https://github.com/redhat-gamedev/srt-godot-client

There are many closed issues related to this error: https://github.com/godotengine/godot/issues?q=%22p_elem-%3E_root+%21%3D+this%22+is%3Aclosed

The two that are potentially the most closely related are the following: https://github.com/godotengine/godot/issues/22565 https://github.com/godotengine/godot/issues/20085

These both mention threading and node manipulation. The second one explicitly mentions the use of "add_child".

In the game client, there are really only two places where add_child is meaningfully used: https://github.com/redhat-gamedev/srt-godot-client/blob/main/Scenes/MainScenes/Game.cs#L134 https://github.com/redhat-gamedev/srt-godot-client/blob/main/Scenes/MainScenes/Game.cs#L165

Neither seems to reliably produce the error.

Steps to reproduce

Unfortunately I have no reliable reproducer.

Minimal reproduction project

N/A

thoraxe avatar Jul 13 '22 06:07 thoraxe

I don't think add_child is related, because the game client was sitting and doing nothing (no ships or missiles were created) and it suddenly started spewing the error.

thoraxe avatar Jul 13 '22 06:07 thoraxe

This is still present in 3.5.1.

I noticed that the issue cropped up while I was sending player move commands in the client. Here's the client send command method: https://github.com/redhat-gamedev/srt-godot-client/blob/main/Networking/ServerConnection.cs#L123-L143

Here's some output from the logs:

[12:58:06 DBG] Game.cs: Got move command
ERROR: Condition "p_elem->_root != this" is true.
   at: remove (./core/self_list.h:80)

Not sure this is related, but this is when it happened.

thoraxe avatar Nov 11 '22 18:11 thoraxe

Is there anything I could do to get additional debug data to help with testing, @Calinou ?

thoraxe avatar Nov 14 '22 18:11 thoraxe

Is there anything I could do to get additional debug data to help with testing, @Calinou ?

We need a minimal reproduction project, with the code isolated as much as possible, and in GDScript (to rule out C# being the cause of the bug). We can't do anything about this otherwise.

Calinou avatar Nov 14 '22 18:11 Calinou

@Calinou You are describing a bit of a chicken-egg troubleshooting challenge. I could create a reproducer if I knew what the problem was. But without additional debugging being spit out by Godot, I can't figure out what the problem is to create a reproducer. I don't even know where in the trace of my code path this error starts being generated or what part of Godot is causing it to continue to be generated.

Even if we could figure out more specifically where the problem lies, having to rewrite the reproducer in GDScript could potentially be impossible if the problem is being caused by C#. As this project uses various C# libraries that have no possible analogs in GDScript, this seems like a pretty huge uplift.

Put yourself in my shoes -- how would you begin to try to create a reproducer?

thoraxe avatar Nov 14 '22 20:11 thoraxe

Even if we could figure out more specifically where the problem lies, having to rewrite the reproducer in GDScript could potentially be impossible if the problem is being caused by C#.

If the issue is caused by C# there's no need to rewrite it in GDScript, an MRP in C# would suffice. We ask for MRPs in GDScript because it's easier for contributors to test (not every contributor uses C#) so you are more likely to get attention. Also, if creating the the equivalent MRP in GDScript does not reproduce the bug, that is a clear indication that the bug is specific to C# so it gives us a hint as to where the issue may come from.

I could create a reproducer if I knew what the problem was. But without additional debugging being spit out by Godot, I can't figure out what the problem is to create a reproducer. I don't even know where in the trace of my code path this error starts being generated or what part of Godot is causing it to continue to be generated.

You mentioned that you get ERROR: Condition "p_elem->_root != this" is true. The only assert I could find with that condition is in SelfList<T>::List::remove(SelfList<T>):

https://github.com/godotengine/godot/blob/0803b4168df1cc47ff440dd2419b358e2fccea41/core/self_list.h#L80

Is this the only output you get in the console? Is there anything else that may indicate where the error comes from? Like previous messages that may indicate what was being executed immediately before the error.

You can try removing parts of your game until the error stops happening to try and find what causes it, you probably already have an idea of what part of the code is most likely the cause based on what you were doing when the error occurred so try with that first.

Since you seem to be using asynchronous methods, take a look at those because it's likely they are not synchronized with Godot's main thread. For example, you are calling AddChild as a response of some asynchronous event, see if adding the child with CallDeferred fixes your issue:

CallDeferred("add_child", missileInstance);

Alternatively, if you are willing to build Godot from source, you can try running the binary with gdb or lldb to get a full stacktrace that can indicate where the error comes from.

raulsntos avatar Nov 17 '22 19:11 raulsntos

This PR switched everything to CallDeferred but apparently @roddiekieley still saw the error.

Is this the only output you get in the console? Is there anything else that may indicate where the error comes from? Like previous messages that may indicate what was being executed immediately before the error.

Other than what I've shared above, there is no indication of what is spawning the error. But, once it starts, it seems to spew forever until the client is stopped.

It sounds like building Godot from the source may be the only option to troubleshoot further because I can't imagine we will make much more progress without a full stack trace.

thoraxe avatar Nov 21 '22 20:11 thoraxe

I took a quick spin of the docs and didn't see anything special related to generating debugging symbols. Does this happen automatically when building?

thoraxe avatar Nov 21 '22 20:11 thoraxe

~~To build with debug symbols use the debug_symbols=yes argument in the SCons command.~~

~~I believe the docs about compiling may not have been updated for 4.0 yet, take a look at this PR that changed some arguments recently: https://github.com/godotengine/godot/pull/66242~~

~~Also, take a look at the mono module README.md for instructions specific to C#.~~

Oops, totally missed that this issue was about 3.x. I believe in 3.x the debug symbols are generated by default.

raulsntos avatar Nov 21 '22 20:11 raulsntos

OK well, I have it built, but I don't understand how to use gdb. It keeps pausing when nothing bad is happening.

What is the gdb magic so that it will only pause/backtrace/whatever when the ERROR: Condition "p_elem->_root != this" is true. happens?

thoraxe avatar Nov 22 '22 14:11 thoraxe

@thoraxe I imagine you need this: https://www.cse.unsw.edu.au/~learn/debugging/modules/gdb_conditional_breakpoints/

Zireael07 avatar Nov 22 '22 19:11 Zireael07

@raulsntos unfortunately, ERR_FAIL_COND is a macro, and is not part of self_list.h. Per this StackExchange post GDB is not capable of debugging/breaking macros. This leaves me very few choices.

If I set the breakpoint at line 80, I get a PILE of breakpoints, and Godot effectively gets broken/paused constantly because gdb is doing some bizarre macro expansion or something.

(gdb) break ./core/self_list.h:80
Breakpoint 1 at 0x8f500d: ./core/self_list.h:80. (116 locations)

The above makes Godot unusable.

Do you have any other suggestions on how I can use GDB to only break/pause/etc when it spits out this error? Or is there a way to make Godot spit out the backtrace when that error happens? Like can I add something to the macro code?

thoraxe avatar Nov 23 '22 20:11 thoraxe

Would it be acceptable if I took the macro code and modified self_list.h in order to execute the same code? This would allow me to set a breakpoint on the actual condition/code in self_list. It wouldn't be exactly the same Godot, and there's a chance that a timing would be modified, but I'd at least be able to backtrace when I got to that point where the error is kicked.

WDYT?

thoraxe avatar Dec 01 '22 16:12 thoraxe

AFAIK the macro is just a compile-time replacement so it should be the same, also feel free to make any modifications you need if they help you debug the issue such as using print_line or WARN_PRINT.

Once you locate the source of the issue, you may be able to reproduce it more easily without modifying Godot's source code.

raulsntos avatar Dec 01 '22 16:12 raulsntos

Yes, I'm going to swap the macro call for the code from the macro itself and put that in self_list.h so that I can set a breakpoint.

Alternatively, I have not tried to use VScode to build the Godot project, which could let me debug/backtrace directly in VScode. That may be my next step if I continue to struggle with GDB.

thoraxe avatar Dec 01 '22 16:12 thoraxe

I added CRASH_NOW() inside the method that checks for this error condition, and got this output: https://gist.github.com/thoraxe/f56fd138a23f43be56d82eb9046093de

I'm not sure if this is helpful.

thoraxe avatar Dec 06 '22 13:12 thoraxe

OK, I was finally able to get the debugger inside the game at the time the error condition is raised. Here's the stack at that moment:

SelfList<Node>::List::remove(SelfList<Node>*) (/home/thoraxe/Red_Hat/openshift/godot/core/self_list.h:82)
SceneTree::flush_transform_notifications() (/home/thoraxe/Red_Hat/openshift/godot/scene/main/scene_tree.cpp:187)
SceneTree::idle(float) (/home/thoraxe/Red_Hat/openshift/godot/scene/main/scene_tree.cpp:661)
Main::iteration() (/home/thoraxe/Red_Hat/openshift/godot/main/main.cpp:2298)
OS_X11::run() (/home/thoraxe/Red_Hat/openshift/godot/platform/x11/os_x11.cpp:3987)
main (/home/thoraxe/Red_Hat/openshift/godot/platform/x11/godot_x11.cpp:59)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main@@GLIBC_2.34 (@__libc_start_main@@GLIBC_2.34:43)
_start (@_start:15)

I don't know if this tells you anything.

thoraxe avatar Dec 06 '22 14:12 thoraxe

@raulsntos FYI I took all of the async things that were happening when messages were received and put them into queues that are then processed during _Process on a node. This is what the server code already does and we've never seen this error on the server-side.

I did a quick test with a bunch of players and did not hit this error. We're going to do a deeper soak test later this week but this might have fixed it. Will report back accordingly.

thoraxe avatar Jan 04 '23 21:01 thoraxe

Did it?

laingawbl avatar Mar 01 '23 03:03 laingawbl

Just to +1 thoraxe's possible solution, I'd been bashing my head against the same error for a while. In my case (it sounds like similarly) I was spawning + transforming nodes based on a signal that fired when a network connection received an update. I peeked at his solution here and went for a similar Queue-based approach instead and I haven't seen the error since. Possibly relevant is that I was doing it on multiple separate events and it was only an issue on handlers that would try to affect more than a dozen or so nodes at a time.

choskyo avatar Mar 22 '23 23:03 choskyo

Since switching to using queues, we have apparently eliminated this problem. It no longer seems to be happening. I can't claim direct causation, but the correlation is strong. However, using standard C# Queue was also causing problems where sometimes threading would try to operate on elements already gone from the queue. We switched to ConcurrentQueue to resolve those issues.

I am closing this as no longer an issue due to a lack of other evidence to show that it's still a problem.

Thanks, all, for the tips.

thoraxe avatar Mar 23 '23 13:03 thoraxe