jfx icon indicating copy to clipboard operation
jfx copied to clipboard

JDK-8269921 Text in Textflow and listeners on bounds can cause endless loop/crash and other performance issues

Open FlorianKirmaier opened this issue 4 years ago • 15 comments

It's "a bit" complicated. In some situations, getRuns get's called because listeners on bounds are set. This causes TextFlow to layout to compute the runs. Afterward, the bounds of the parents get updated. This triggers a call to compute bounds - which cascades up to the children. When the geometry of the previous Text gets computed in this big stack - it throws an nullpointer. The Text doesn't have its runs, and calling TextFlow.layout is now a noop (it detects repeated calls in the same stack)

In the case it happened - it didn't repair and the application kinda crashed. This bug most likely can also be triggered by ScenicView or similar tools, which sets listeners to the bounds. It also can cause unpredictable performance issues.

Unit test and example stacktrace are in the ticket.

The suggested fix makes sure that recomputing the geometry of the Text, doesn't trigger the layout of the TextFlow. The Textflow should be layouting by the Parent. This might change the behavior in some cases, but as far as I've tested it works without issues in TextFlow Heavy applications.

Benefits:

  • Better Tooling Support For ScenicView etc.
  • Fixes complicated but reproducible crashes
  • Might fix some rare crashes, which are hard to reproduce
  • Likely improves performance - might fix some edge cases with unpredictable bad performance

Progress

  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8269921: Text in Textflow and listeners on bounds can cause endless loop/crash and other performance issues

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx pull/564/head:pull/564
$ git checkout pull/564

Update a local copy of the PR:
$ git checkout pull/564
$ git pull https://git.openjdk.org/jfx pull/564/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 564

View PR using the GUI difftool:
$ git pr show -t 564

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/564.diff

FlorianKirmaier avatar Jul 06 '21 14:07 FlorianKirmaier

:wave: Welcome back fkirmaier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Jul 06 '21 14:07 bridgekeeper[bot]

/reviewers 2

kevinrushforth avatar Jul 06 '21 14:07 kevinrushforth

@kevinrushforth The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

openjdk[bot] avatar Jul 06 '21 14:07 openjdk[bot]

How to proceed to get this PR reviewed?

christianheilmann avatar Jun 09 '22 07:06 christianheilmann

I did just run into this issue again, in a totally different application. I guess we have now make a fork also for our desktop applications. (Currently we only for for JPro)

So it would be great to get this into JavaFX, because I'm quite sure it fixes an important issue.

FlorianKirmaier avatar Jul 27 '22 14:07 FlorianKirmaier

For what it's worth, I agree that updating the bounds or synchronizing peers shouldn't trigger a new layout cycle. Unfortunately, test coverage is not good for the javafx.scene.text classes, so we won't catch regressions that way. Do you think that you can come up with a minimal test that shows a different outcome with or without the call to getParent().layout()? If we had that, we could decide with more certainty which behavior we think is more correct.

mstr2 avatar Jul 27 '22 16:07 mstr2

I got this problem as well today. NPE as runs is null in line 359. Does it make sense to first 'resolve' this by adding a simple null check and later we may try your other change which removes the call to getParent().layout()?

Edit: I also got another interesting NPE in PrismTextLayout:

java.lang.NullPointerException: Cannot read the array length because "this.runs" is null
    at com.sun.javafx.text.PrismTextLayout.addTextRun(PrismTextLayout.java:770)
    at com.sun.javafx.text.GlyphLayout.addTextRun(GlyphLayout.java:140)
    at com.sun.javafx.text.GlyphLayout.breakRuns(GlyphLayout.java:210)
    at com.sun.javafx.text.PrismTextLayout.buildRuns(PrismTextLayout.java:785)
    at com.sun.javafx.text.PrismTextLayout.layout(PrismTextLayout.java:1036)
    at com.sun.javafx.text.PrismTextLayout.ensureLayout(PrismTextLayout.java:223)
    at com.sun.javafx.text.PrismTextLayout.getBounds(PrismTextLayout.java:246)
    at javafx.scene.text.Text.getLogicalBounds(Text.java:432)
    at javafx.scene.text.Text.doComputeGeomBounds(Text.java:1187)
    at javafx.scene.text.Text$1.doComputeGeomBounds(Text.java:149)
    at com.sun.javafx.scene.shape.TextHelper.computeGeomBoundsImpl(TextHelper.java:90)
    at com.sun.javafx.scene.NodeHelper.computeGeomBounds(NodeHelper.java:116)
    at javafx.scene.Node.updateGeomBounds(Node.java:3818)
    at javafx.scene.Node.getGeomBounds(Node.java:3780)
    at javafx.scene.Node.updateBounds(Node.java:776)
    at javafx.scene.Parent.updateBounds(Parent.java:1833)
    ...
   <<pulse>>

Maran23 avatar Aug 05 '22 07:08 Maran23

I've just verified that the test is also green with the call to parent.layout() So it might be an option, to change this PR to only contain the null check.

When I find some time, I will write a test to show how the parent.layout() call can cause issues. Similar issues exist with Group - which sometimes has the effect, that applications stop working when scenic view is used. (or other tooling)

The exception in PrismTextLayout tends to happen when also the bug of this PR happens - but I don't really know how they relate.

FlorianKirmaier avatar Aug 19 '22 09:08 FlorianKirmaier

So it might be an option, to change this PR to only contain the null check.

If that is the route you want to go, then filing a new issue / PR is better. A fix with a null check might be a good option as long as we understand why the null is happening (so we can know whether the null check is sufficient).

When I find some time, I will write a test to show how the parent.layout() call can cause issues. Similar issues exist with Group - which sometimes has the effect, that applications stop working when scenic view is used. (or other tooling)

That would be helpful. As it is, this PR is not really ready for review for the reasons I mentioned (much) earlier.

kevinrushforth avatar Aug 19 '22 13:08 kevinrushforth

I've just removed the change in the layouting. So I guess the PR should be good now.

FlorianKirmaier avatar Aug 22 '22 09:08 FlorianKirmaier

FYI: in the near past we have also many of those "crashes" when the UI gets heavily restructured. Crashes in our case means that on Windows for example you get the loading circle but the UI freezes endlessly. I assume it might be related.

Question: does the simple null check help in such a case?

danielpeintner avatar Sep 20 '22 08:09 danielpeintner

The simple null check helps in this case. But sometimes I see the exception, posted by @Maran23 . So the root cause is probably not fixed - but the symptoms are. I guess, this PR is then a good progress.

Imo the main issue is, that having listeners on the bounds causes a change in the behavior of TextFlow and Group - causing unpredictable behavior.

FlorianKirmaier avatar Sep 21 '22 12:09 FlorianKirmaier

The simple null check helps in this case. But sometimes I see the exception, posted by @Maran23 . So the root cause is probably not fixed - but the symptoms are. I guess, this PR is then a good progress.

Maybe it's not good progress. You've investigated the issue, and have identified a likely root cause. Not fixing the root cause just means that it most likely won't be fixed for years to come, and the insights you've accumulated will fade.

I think we should just fix the root cause. If some application breaks as a result of this, not much harm is done: we'd gain valuable insight into a real-world situation that depends on the current behavior, and can then decide whether to revert the fix, or provide a viable alternative solution (for users) that accomplishes the same goal.

mstr2 avatar Sep 21 '22 13:09 mstr2

@mstr2 I see you arguments and your concerns. On the other hand, as a developer affected by this problem, it is different. Costumers are complaining and now it seems to occur even more frequently.

Note: I do not have any voting rights and I do not want to make any claims. Anyhow, I think we all agree that the simple null check solves the issue by hiding it and definitely does not create any regression. Hence, affected applications may get a quick fix and applications are running stable. The actual fix can come later, taking more time, with more sophisticated testing et cetera. I am happy to help there with our application as well. Thanks!

danielpeintner avatar Sep 22 '22 06:09 danielpeintner

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Mar 31 '23 23:03 bridgekeeper[bot]

@FlorianKirmaier this one fell off my radar. The updated fix seems fine as long the check for null runs isn't making some other problem. Btw, the title of the bug doesn't match the problem being fixed, so I recommend changing it to something like "TextFlow: listeners on bounds can throw NPE while computing text bounds".

kevinrushforth avatar Apr 01 '23 12:04 kevinrushforth

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Apr 29 '23 16:04 bridgekeeper[bot]

@FlorianKirmaier This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

bridgekeeper[bot] avatar May 27 '23 21:05 bridgekeeper[bot]

/open

FlorianKirmaier avatar Sep 26 '23 08:09 FlorianKirmaier

@FlorianKirmaier This pull request is now open

openjdk[bot] avatar Sep 26 '23 08:09 openjdk[bot]

@kevinrushforth I guess it also fell under my radar too.

I've merged the branch with master! I also changed the title as suggested.

Testing: I've extensively tested the Change since the PR was created. All JPro versions use this bugfix, and a very big Desktop application uses these fixes. So I'm sure this doesn't cause any instability.

FlorianKirmaier avatar Sep 26 '23 08:09 FlorianKirmaier

@prrace do you want to be the second reviewer?

kevinrushforth avatar Sep 26 '23 13:09 kevinrushforth

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Oct 24 '23 17:10 bridgekeeper[bot]

@andy-goryachev-oracle Can you be the second reviewer?

kevinrushforth avatar Oct 25 '23 17:10 kevinrushforth

@FlorianKirmaier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8269921: TextFlow: listeners on bounds can throw NPE while computing text bounds

Reviewed-by: kcr, angorya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 47 new commits pushed to the master branch:

  • 0d3341704c1ced4f61b5f7f7618e23ec182ae379: 8320267: WebView crashes on macOS 11 with WebKit 616.1
  • d65d8a65414b96ca475d0914b6533cd6e0192984: 8316518: javafx.print.Paper getWidth / getHeight rounds values, causing errors.
  • c46c172be65370ef1a121d80b71cfb34cf887d2e: 8320773: [macOS] All IME input blocked
  • b80ec391cbba72d84b4b862b3f1b8db2ff8eb6e2: 8313648: JavaFX application continues to show a black screen after graphic card driver crash
  • 3548cddedcf3e99af9263efc8004a092a55af8fb: 8320444: Column drag header is positioned wrong for nested columns
  • 606878af275dbad551a10653c92a2deef61c10cd: 8318387: Update GStreamer to 1.22.6
  • a1036b2691cd0cdaa0e99e9583558ee9dc103692: 8303826: Add FX test for JDK-8252255
  • cda623d74a17c1aac034bac175e4fb10bd1fd52b: 8087700: [KeyCombination, Mac] KeyCharacterCombinations behave erratically
  • 2e7330417c2f39fee15676966d5b7455bb3fcbff: 8319996: Update to GCC 13.2.0 on Linux
  • 59c86bbe2b83a6e76f0dfc357f3197fe7af79e6c: 8303478: DatePicker throws uncatchable exception on tab out from garbled text
  • ... and 37 more: https://git.openjdk.org/jfx/compare/5d1254fb1a126ac3d6baff5c5a303ed750d083fa...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kevinrushforth, @andy-goryachev-oracle) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk[bot] avatar Nov 14 '23 16:11 openjdk[bot]

@FlorianKirmaier This PR is ready for you to integrate.

kevinrushforth avatar Nov 16 '23 15:11 kevinrushforth

@FlorianKirmaier As a reminder, this PR is ready for you to /integrate

kevinrushforth avatar Nov 28 '23 22:11 kevinrushforth

/integrate

FlorianKirmaier avatar Nov 30 '23 14:11 FlorianKirmaier

@FlorianKirmaier Your change (at version dda1a0c43045a999abb7cdc4cb531a3db0cc856d) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Nov 30 '23 14:11 openjdk[bot]