jfx icon indicating copy to clipboard operation
jfx copied to clipboard

8322619: Parts of SG no longer update during rendering - overlapping - culling - dirty

Open FlorianKirmaier opened this issue 1 year ago • 28 comments

In some situations, a part of the SG is no longer rendered. I created a test program that showcases this problem.

Explanation:

This can happen, when a part of the SG, is covered by another Node. In this part, one node is totally covered, and the other node is visible.

When the totally covered Node is changed, then it is marked dirty and it's parent, recursively until an already dirty node is found. Due to the Culling, this totally covered Node is not rendered - with the effect that the tree is never marked as Clean.

In this state, a Node is Dirty but not It's parent. Based on my CodeReview, this is an invalid state which should never happen.

In this invalid state, when the other Node is changed, which is visible, then the dirty state is no longer propagated upwards - because the recursive "NGNode.markTreeDirty" algorithm encounters a dirty node early.

This has the effect, that any SG changes in the visible Node are no longer rendered. Sometimes the situation repairs itself.

Useful parameters for further investigations: -Djavafx.pulseLogger=true -Dprism.printrendergraph=true -Djavafx.pulseLogger.threshold=0

PR: This PR ensures the dirty flag is set to false of the tree when the culling is used. It doesn't seem to break any existing tests - but I'm not sure whether this is the right way to fix it. It would be great to have some feedback on this solution - maybe guiding me to a better solution.

I could write a test, that just does the same thing as the test application, but checks every frame that these nodes are not dirty - but maybe there is a better way to test this.


Progress

  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issues

  • JDK-8322619: Parts of SG no longer update during rendering - overlapping - culling - dirty (Bug - P4)
  • JDK-8332040: Parts of SG no longer update during rendering - overlapping - culling - dirty (CSR) (Withdrawn)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1310/head:pull/1310
$ git checkout pull/1310

Update a local copy of the PR:
$ git checkout pull/1310
$ git pull https://git.openjdk.org/jfx.git pull/1310/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1310

View PR using the GUI difftool:
$ git pr show -t 1310

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1310.diff

Webrev

Link to Webrev Comment

FlorianKirmaier avatar Dec 21 '23 12:12 FlorianKirmaier

:wave: Welcome back fkirmaier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Dec 21 '23 12:12 bridgekeeper[bot]

Fixed wrong ticket-number in title and commit

FlorianKirmaier avatar Dec 21 '23 12:12 FlorianKirmaier

@FlorianKirmaier Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

openjdk[bot] avatar Dec 21 '23 12:12 openjdk[bot]

/reviewers 2

kevinrushforth avatar Dec 21 '23 14:12 kevinrushforth

@kevinrushforth The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

openjdk[bot] avatar Dec 21 '23 14:12 openjdk[bot]

If someone who understands the rendering could provide some feedback on what is happening - than I can probably make a better fix, and write a reasonable unit-test for this bug.

removed some text with edit, because after thinking about it, it was wrong

FlorianKirmaier avatar Dec 21 '23 15:12 FlorianKirmaier

The fully covered node which was dirty is now marked clean in the PR when it is culled. Now that it is clean, what happens when it is uncovered (being careful not to change it in any other way)? Does it get rendered correctly, or does it miss the last change made to it while it was fully covered?

If that works correctly still, then I think this is the right fix. Culling the node is (IMHO) the same as rendering it (there is nothing to render) so it should be clean after being culled.

hjohn avatar Dec 22 '23 08:12 hjohn

It is worth noting that the idea of cleaning the dirty tree, for the culled node, is also done at line NGNode:1414, but for some reason, it doesn't catch the case from this bug.

FlorianKirmaier avatar Dec 22 '23 12:12 FlorianKirmaier

I tried out a bit, by removing the overlapping area in my test program. I added the changed program to the ticket. Didn't notice anything wrong. There are probably many ways to test for regressions.

Can't guarantee that there are no cases that go wrong - but if something doesn't work, then it probably happens more rarely than the original bug.

FlorianKirmaier avatar Jan 16 '24 11:01 FlorianKirmaier

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Feb 20 '24 12:02 bridgekeeper[bot]

push It's still a important bug. If someone can guide me on how to write tests for this, i would put in the effort. Maybe we could make snapshot based tests? That should work on all platforms, right? (Like testing whether a specific RGB value appears) Has something in this direction been done somewhere?

FlorianKirmaier avatar Mar 13 '24 09:03 FlorianKirmaier

Maybe we could make snapshot based tests? That should work on all platforms, right?

We have a whole suite of tests which use similar methods to perform checks, called robot tests. Since you want to check if a node was properly updated, and the best way to do that seems to be checking RGB values, I think a test for this case belongs right there.

You could use robot utility to check color values of specific pixels on the screen. I think the best way to go about it would be to do something similar to what your test bug apps do (the ones attached to the JBS issue). Ideally I would try and look for first some form of confidence check (if correct controls are "on" and "off"), then progressing the animation and seeing if controls updated as intended.

All robot tests are located in tests/system/src/test/java/test/robot and then they follow package names. I'm not 100% sure where this test should go though (I assume .../javafx/scene?).

lukostyra avatar Mar 13 '24 09:03 lukostyra

❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.

openjdk[bot] avatar Mar 13 '24 20:03 openjdk[bot]

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Apr 10 '24 21:04 bridgekeeper[bot]

I analysed the bug and can confirm the fix. I summarized it in the following table. First column is the node structure, the others the steps: update nodes, render, update nodes, render, ...

Nodes (L - left and R - right) / Steps 1st -> update Circles 2nd -> render 3rd -> update Lines 4th -> render
HBox dirty clearDirty dirty clearDirty
HBox >StackPane-L dirty clearDirty dirty clearDirty
HBox >StackPane-L > Group-L dirty clearDirty dirty clearDirty
HBox >StackPane-L > Group-L >Line-L dirty clearDirty
HBox >StackPane-L > Group-L >Circle-L dirty clearDirty
HBox >StackPane-R dirty clearDirty *1 *3
HBox >StackPane-R > Group-R dirty dirty dirty
HBox >StackPane-R > Group-R > Line-R dirty *2 dirty
HBox >StackPane-R > Group-R > Circle-R dirty dirty dirty dirty

*1: > no culling bits for first dirty region (Circle-L) -> skip children -> clearDirty() is not called on children > no root path for the second dirty region (Circle-R) -> no rendering and root.clearDirtyTree() stops on StackPane-R (the children stay dirty: Group-R and Circle-R)

*2: > Line-R.markTreeDirty() skips marking of further parents because Group-R is already dirty -> the StackPane-R stays clean

*3: > the dirty regions are not collected because the StackPane-R is clean -> Line-R is not rendered and the dirty-flag is not reset


The error is that clearDirty() is not called in the second step on the children of StackPane-R (see *1). Subsequent changes in Line-R and Circle-R are not propagated to the root node because Group-R is already dirty (see *2) and therefore StackPane-R remains clean. The last changes are therefore ignored in the next rendering phase (see *3).


We can simply say: the dirty flags must be deleted on all nodes after rendering.

If this is true, the test can simply check this instead of screenshots or similar.

eduardsdv avatar Apr 26 '24 16:04 eduardsdv

After further investigation and testing I would suggest to remove all clearDirty() and clearDirtyTree() calls and put only one clear-dirty pass after rendering is done (at the end of ViewerPainter.paintImpl(final Graphics backBufferGraphics)).

Since the markDirty() method does both, sets the dirty flag and propagates it to the ancestor nodes, it is better if there is only one method for deleting the dirty flag from the node and all its children. Therefore, I would remove the clearDirtyTree() method and move its content to the clearDirty() method.

This way we can guarantee that all dirty flags are removed after the rendering is completed. We also guarantee that a clean parent with dirty children will never occur (the bug this PR addresses).

Furthermore, my quick test shows that fewer clearDirty() calls are required in the new version. Bonus: We are even faster.

    public void clearDirty() {
        if (dirty != DirtyFlag.CLEAN || childDirty) {
            dirty = DirtyFlag.CLEAN;
            childDirty = false;
            dirtyBounds.makeEmpty();
            dirtyChildrenAccumulated = 0;

            if (this instanceof NGGroup) {
                List<NGNode> children = ((NGGroup) this).getChildren();
                for (NGNode child : children) {
                    child.clearDirtyTree_();
                }
            }
        }

        if (getClipNode() != null) {
            getClipNode().clearDirty();
        }
    }

eduardsdv avatar Apr 29 '24 16:04 eduardsdv

I added a test that shows that the @FlorianKirmaier's fix works. You can start it with: gradlew -PFULL_TEST=true :systemTests:test --tests NGNodeDirtyFlagTest.

To avoid such errors in the future, I would still suggest the refactoring, which I wrote about in my last comment.

@kevinrushforth and @arapte please review.

eduardsdv avatar May 03 '24 10:05 eduardsdv

I created an alternative solution for this bug. See PR: https://github.com/openjdk/jfx/pull/1451

eduardsdv avatar May 06 '24 14:05 eduardsdv

/csr unneeded

kevinrushforth avatar May 11 '24 13:05 kevinrushforth

@kevinrushforth The CSR requirement cannot be removed as CSR issues already exist. Please withdraw JDK-8332040 and then use the command /csr unneeded again.

openjdk[bot] avatar May 11 '24 13:05 openjdk[bot]

/csr unneeded

kevinrushforth avatar May 11 '24 19:05 kevinrushforth

@kevinrushforth determined that a CSR request is not needed for this pull request.

openjdk[bot] avatar May 11 '24 19:05 openjdk[bot]

Sorry for creating the CSR, it was an accident.

Now that Eduard has both created an alternative solution, but also has reviewed that this solution is correct, and provided an unit-test - I think this (or the other) PR is ready to move forwards.

@kevinrushforth What do you think?

This is still quite a fundamental bug which is probably quite prevalent, because when it happens it is really hard to notice it. But it probably happens more often than we think.

FlorianKirmaier avatar May 17 '24 12:05 FlorianKirmaier

Sorry for creating the CSR, it was an accident.

No problem.

Now that Eduard has both created an alternative solution, but also has reviewed that this solution is correct, and provided an unit-test - I think this (or the other) PR is ready to move forwards.

@kevinrushforth What do you think?

Yes, I've asked @arapte to be the primary reviewer on this. I'll leave it to his judgment as to which of the two approaches to take forward.

kevinrushforth avatar May 18 '24 14:05 kevinrushforth

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jun 15 '24 15:06 bridgekeeper[bot]

keep alive

eduardsdv avatar Jun 17 '24 07:06 eduardsdv

@arapte Please review.

eduardsdv avatar Jun 26 '24 07:06 eduardsdv

@FlorianKirmaier since it looks like we are going with PR #1451 as the solution, please close this PR (or move it to Draft until #1451 is integrated and then close it).

kevinrushforth avatar Jul 04 '24 12:07 kevinrushforth