jfx
jfx copied to clipboard
8285893: Hiding dialog and showing new one causes dialog to be frozen
This PR is based on a discussion that happened over in PR #1324. Some of this explanation is copied from that thread.
When exitNestedEventLoop is called on the innermost loop the invokeLaterDispatcher suspends operation until the loop finishes. But if you immediately start a new event loop the previous one won't finish and the dispatcher will jam up and stop dispatching indefinitely.
When the invokeLaterDispatcher is told that the innermost loop is exiting it sets leavingNestedEventLoop to true expecting it to be set to false when the loop actually exits. When the dispatcher is told that a new event loop has started it is not clearing leavingNestedEventLoop which is causing the jam. Basically it should follow the same logic used in glass; leaving the innermost loop updates a boolean indicating that the loop should exit but if a new loop is started the boolean is set back to a running state since it now applies to the new loop, not the previous one.
I suspect the invokeLaterDispatcher exists in part to deal with the specifics of how deferred runnables are handled on the Mac. I investigated this a bit and wrote up some comments in the Mac glass code.
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)
Issue
- JDK-8285893: Hiding dialog and showing new one causes dialog to be frozen (Bug - P3)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1449/head:pull/1449
$ git checkout pull/1449
Update a local copy of the PR:
$ git checkout pull/1449
$ git pull https://git.openjdk.org/jfx.git pull/1449/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1449
View PR using the GUI difftool:
$ git pr show -t 1449
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1449.diff
Webrev
:wave: Welcome back mfox! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.
As with the earlier PR, this will need a lot of testing and careful review.
@Maran23 I'd be interested in your thoughts on this approach.
/reviewers 2
@kevinrushforth The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).
This looks like a safe, targeted fix for the problem. It will need to be tested on all platforms with a full run of all system tests.
I get the following internal assertion failure when running the test on Linux:
NestedEventLoopTest > testCanExitAndThenEnterNewLoop FAILED
java.lang.AssertionError: Internal inconsistency - wrong EventLoop
at javafx.graphics@23-ea/com.sun.glass.ui.EventLoop.enter(EventLoop.java:108)
at javafx.graphics@23-ea/com.sun.javafx.tk.quantum.QuantumToolkit.enterNestedEventLoop(QuantumToolkit.java:656)
at javafx.graphics@23-ea/javafx.application.Platform.enterNestedEventLoop(Platform.java:310)
at test.javafx.stage.NestedEventLoopTest.lambda$testCanExitAndThenEnterNewLoop$29(NestedEventLoopTest.java:326
It ran cleanly on Mac and Windows.
I'll look into the Linux failure. The core EventLoop code passes an object into the application's leaveNestedEventLoop and expects to see that object returned from the applications's matching enterNestedEventLoop call. On Mac and Windows this object is passed through to glass as an argument on the stack. On Linux this value is handled as an application global. I suspect the Linux bookkeeping isn't robust enough to handle this case but it will take a bit to nail down the details.
I now understand what’s going wrong on Linux but I don’t know how to fix it.
When the test case tries to leave the innermost loop (let’s call it A) the cross-platform Java code calls Application.leaveNestedEventLoop to tell glass that the loop should exit. But before the loop can exit the test case starts a new loop leading to a call to Application.enterNestedEventLoop.
On Mac the leaveNestedEventLoop call sets a few global state variables in glass and the enterNestedEventLoop call wipes out all those changes. So as far as glass is concerned the leaveNestedEventLoop call was never made. This works because the cross-platform Java code is holding onto an object that represents loop A; it’s down one level in the event loop stack and marked as LEAVING. When that object reaches the top of the event loop stack the core will once again call Application.leaveNestedEventLoop to exit loop A.
On Linux leaveNestedEventLoop makes changes in glass that cannot be erased or undone (it calls gtk_main_quit which updates internal GTK state that we don’t have access to). In the end GTK will exit the loops in the correct order but it will exit loop A before the core code has a chance to call Application.leaveNestedEventLoop again. This is throwing off the bookkeeping including some debug checks.
(JavaFX struggles with this scenario because the original test case behaves in such an unexpected way. While processing a click event on a modal stage the event handler hides that modal and then immediately starts a new modal. The original modal can’t really go away; the handler for the click event originated there and is still in progress. That first modal might not be visible but it can’t truly close until the second modal is dismissed.)
I'll look into the Linux failure. The core EventLoop code passes an object into the application's leaveNestedEventLoop and expects to see that object returned from the applications's matching enterNestedEventLoop call. On Mac and Windows this object is passed through to glass as an argument on the stack. On Linux this value is handled as an application global. I suspect the Linux bookkeeping isn't robust enough to handle this case but it will take a bit to nail down the details.
There is one additional change I made, which might be relevant for Linux: https://github.com/openjdk/jfx/pull/1324/files#diff-af779aafb50953f57cab2478dd220d0322592b60e92065cf658644866572b7e7R117
Worth to check. I remember that the code there was problematic to me.
Otherwise this looks good. https://github.com/openjdk/jfx/pull/1324 also cleans up the unused return value which is allocated in the C++/Objective-C side, but never really used on the Java side. So might be worth to do at some point, but I would agree to do the minimal changes first.
On Linux leaveNestedEventLoop makes changes in glass that cannot be erased or undone (it calls gtk_main_quit which updates internal GTK state that we don’t have access to). In the end GTK will exit the loops in the correct order but it will exit loop A before the core code has a chance to call Application.leaveNestedEventLoop again. This is throwing off the bookkeeping including some debug checks.
That is exactly what I also figured out. If a second new nested event loop is started directly after the first one was requested to leave, the native code will never 'return'.
That's why this code exists: https://github.com/openjdk/jfx/blob/master/modules/javafx.graphics/src/main/java/com/sun/glass/ui/EventLoop.java#L118
which I slightly modified to not run invokeLater in my PR.
Otherwise it would never return any value and code after enterNestedEventLoop is never called:
Object o = Platform.enterNestedEventLoop(this);
System.out.println(o); // <- never called
And since you fixed the event jam so that we are not stuck anymore and Linux is implemented different with a global stateful variable, we will get that 'inconsistency', which I don't think is relevant or a problem here, hence I removed it in my PR.
I also don't found any other way to deal with that situation, as you also stated, there is nothing else we can do, and while the other OS implementation have some kind of while loop where we can try to implement something, on the Linux side we completely rely on gtk_main and gtk_main_quit.
That is exactly what I also figured out. If a second new nested event loop is started directly after the first one was requested to leave, the native code will never 'return'. That's why this code exists: https://github.com/openjdk/jfx/blob/master/modules/javafx.graphics/src/main/java/com/sun/glass/ui/EventLoop.java#L118 which I slightly modified to not run
invokeLaterin my PR.
I think the invokeLater call is required. On Mac and Windows calls into Glass can only affect the innermost loop. For each loop you want to exit you have to call Application.leaveNestedEventLoop and then allow the execution stack to unwind to Glass so it can remove the loop. Only then can you initiate the process of leaving the next loop. So if there’s multiple loops in a LEAVING state each one needs to be handled in a separate runnable so the execution stack can unwind between runnables.
When leaving the innermost loop the steps of interest are:
- call Application.leaveNestedEventLoop
- disable the InvokeLaterDispatcher
- unwind the execution stack to glass
- enable the InvokeLaterDispatcher
My tweak to the code ensures that the InvokeLaterDispatcher is re-enabled if we start at step 1 but enter a new event loop before we can reach step 3. On Mac and Windows we’ll always restart the leaving process at step 1 for all loops. That’s not happening on Linux, the old loop is leaving without starting at step 1. I’m pretty sure this is opening a window where the InvokeLaterDispatcher can dispatch a runnable to the old loop while it’s leaving. That’s not supposed to happen but I’m not sure what the consequences might be on Linux (I know it would be very bad on Mac).
@beldenfox This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
I don’t think this issue needs to be fixed. In the original bug report a click event hides its window and then pops a modal. While the modal is up the original window isn’t visible onscreen but has to stick around behind the scenes until the modal goes away. I don’t think the system should be left in this state and I doubt most developers expect to have a half-hidden window sticking around waiting for the click event to unwind.
It’s easy to detect this case and issue a warning to let the developer know that they should be popping the modal in a runLater runnable. But maybe I’m missing a use case for this particular call sequence?
With that said it is possible to bring Linux in line with the other platforms where leaveNestedEventLoop just sets a flag indicating that the current loop should exit after the current runnable is done. I’ve updated this PR with the code (and marked it Draft because my testing has been light).
But, again, I don’t think we should try to fix this.
The error is reproduced if the button click handler closes the current form and shows the Alert. The Alert is shown empty.
I don’t think this issue needs to be fixed. In the original bug report a click event hides its window and then pops a modal. While the modal is up the original window isn’t visible onscreen but has to stick around behind the scenes until the modal goes away. I don’t think the system should be left in this state and I doubt most developers expect to have a half-hidden window sticking around waiting for the click event to unwind.
It’s easy to detect this case and issue a warning to let the developer know that they should be popping the modal in a runLater runnable. But maybe I’m missing a use case for this particular call sequence?
With that said it is possible to bring Linux in line with the other platforms where leaveNestedEventLoop just sets a flag indicating that the current loop should exit after the current runnable is done. I’ve updated this PR with the code (and marked it Draft because my testing has been light).
But, again, I don’t think we should try to fix this.
I understand your reasoning. Technically, this bug can not be fixed. Because of how every OS works and how the the JavaFX side is working as well.
But as a developer, this looks just 'stupid' in a way that you are asking yourself: Why can't I show another dialog when I just closed one? It would be convenient if JavaFX can handle that. A warning would be okay as well, but I still hope that we find a way to fix this problem.
The first time I saw this bug was when I was showing a dialog with a progressbar, and when it finished, closed it and showed a new dialog to notify that everything was successful. This dialog suffered from the bug and was completely white. To fix that problem, I changed the logic to show the 'success' dialog before closing the 'progressbar' dialog, because that worked. But that felt just very weird and somewhat stupid.
So I can see both sides here and it is hard to think about a somewhat good or good-enough solution for that problem.
But as a developer, this looks just 'stupid' in a way that you are asking yourself: Why can't I show another dialog when I just closed one? It would be convenient if JavaFX can handle that. A warning would be okay as well, but I still hope that we find a way to fix this problem.
I should not have given my opinion on whether this bug should be fixed or not. I was basing that mostly on my distaste at the idea that an invisible window would be hanging around waiting for the click event to end. But that's beside the point; if we don't address this it puts a burden on developers and that's what matters. And I can't really address the extent of that burden with any authority.
I think this bug can be fixed. In other words, I think we can bring Linux in line with the other platforms and support the sequence of leave/enterNestedEventLoop calls that you want. This PR contains a first pass at doing that. I can always take it out of draft and see if we can get some review cycles on it.
@beldenfox This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
I should not have given my opinion on whether this bug should be fixed or not. I was basing that mostly on my distaste at the idea that an invisible window would be hanging around waiting for the click event to end. But that's beside the point; if we don't address this it puts a burden on developers and that's what matters. And I can't really address the extent of that burden with any authority.
No, I do completely agree and understand that. I just don't have any other idea right now. :-(