godot icon indicating copy to clipboard operation
godot copied to clipboard

Prevent infinite recursion in first `_draw`

Open AThousandShips opened this issue 1 year ago • 4 comments

Caused by size changes fired in some cases, mainly by auto-translation, and due to that _redraw_callback is called from a flush causing any queued methods to be fired immediately.

Added pending_update = false; to the _exit_canvas method to ensure things behave reasonably in some edge cases like changing the world_2d or reparenting self during _draw which would otherwise never fire a redraw. This is not really reasonable use but I feel it's good to not introduce weird behavior in these cases.

The underlying cause here seems to be that when entering the tree a bunch of different calls are made including several that unconditionally resize the node. These are likely inefficient and could be improved but this offers a fix that ensures things behave reasonably in these cases.

So I'd say this is a functional fix, we can look at reducing unnecessary calls to queue_redraw and resizing calls on entering the tree etc. (it happens in things like NOTIFY_TRANSLATION_CHANGED and a bunch of others) but this should ensure things run smoothly.

MRP for testing: recursive_draw.zip

  • Fixes: https://github.com/godotengine/godot/issues/95810
  • Fixes: https://github.com/godotengine/godot/issues/95709

AThousandShips avatar Sep 22 '24 14:09 AThousandShips

What seems to be happening here is that:

  • Node receives the NOTIFICATION_ENTER_TREE notification, fires the NOTIFICATION_TRANSLATION_CHANGED notification, which queues a draw (and size changes as well, which might or might not be relevant to that part)
  • CanvasItem receives the NOTIFICATION_ENTER_TREE notification, calls _enter_canvas which clears pending_update which forces a second draw
  • When the message queue is flushed the two draw calls fire back-to-back
  • The child node is added, this fires resizing in the child, which in turn fires resizing in the parent, when this happens in the second draw call pending_update is once again false as it was cleared at the end of the first draw, this allows a new draw to be queued immediately

If you remove the add_child call from the MRP and run it without this PR you will notice it draws twice, when it should only draw once, this removes that double call as well (which is the cause of the bug)

AThousandShips avatar Sep 22 '24 14:09 AThousandShips

Unsure if this can be triggered in 4.2 but adding a cherry pick for it as the fix is limited in scope

Unsure if there's any weird behavior triggered by this same code in 3.x (it was added back before Godot was open source) but can make a dedicated fix for it if desired to eliminate any potential issues there too (don't have a 3.x setup to test myself right now)

AThousandShips avatar Sep 22 '24 14:09 AThousandShips

The simplicity of this fix has me wonder if doing it this way has any unforeseen consequences.

Mickeon avatar Sep 22 '24 22:09 Mickeon

I've dug through and tested various cases, the pending_update clearing in _enter_canvas only matters in the narrow edge case of when canvas operations happen during a draw call, as it will always be cleared otherwise, to be entirely safe (with the cost of losing the case of changing the world or re-parenting the node during a draw call) I could remove it in _exit_canvas, but there simply isn't any need for the pending_update clearing in _enter_canvas and it becoming a problem was really only revealed by the translation changes, it was always a risk just not caused by any existing condition (that I know of, there could be cases that cause this in older versions just haven't found any)

AThousandShips avatar Sep 23 '24 07:09 AThousandShips

What is current status of this PR ?

syntaxerror247 avatar Oct 03 '24 11:10 syntaxerror247

It's awaiting review, it works well but needs some decision from the core team

AThousandShips avatar Oct 03 '24 11:10 AThousandShips

The PR itself makes sense to me. But I am a little skeptical seeing as the behaviour changed between 4.2 and 4.3, but the code this PR changes hasn't been touched in many years.

I suspect that it would be best to solve this bug in the spot that introduced the regression otherwise we will be risking breaking other untested code paths in subtle ways

clayjohn avatar Nov 21 '24 08:11 clayjohn

I agree, will look at pinpointing what changed and see if any changes to this fix are needed

AThousandShips avatar Nov 21 '24 09:11 AThousandShips

I have an alternative solution and will push momentarily, will keep the current fix for a potential follow-up fix if we can confirm the changes are safe to future proof this code

AThousandShips avatar Nov 22 '24 16:11 AThousandShips

Re-confirming this still fixes the linked bugs but this should be a conservative fix still resolving the problem

Edit: Confirmed, they all now work correctly

AThousandShips avatar Nov 22 '24 16:11 AThousandShips

Great work tracking that down! This indeed seems like a much safer change to make. Pinging @YeldhamDev to confirm that the translation stuff still looks good

clayjohn avatar Nov 22 '24 19:11 clayjohn

Thanks!

Repiteo avatar Dec 04 '24 17:12 Repiteo

Thank you!

AThousandShips avatar Dec 04 '24 18:12 AThousandShips