jfx
jfx copied to clipboard
8269907 memory leak - Dirty Nodes / Parent removed
After thinking about this issue for some time, I've now got a solution. I know put the scene in the state it is, before is was shown, when the dirtyNodes are unset, the whole scene is basically considered dirty. This has the drawback of rerendering, whenever a window is "reshown", but it restores sanity about memory behaviour, which should be considered more important.
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)
Integration blocker
⚠️ Title mismatch between PR and JBS for issue JDK-8269907
Warning
⚠️ Found leading lowercase letter in issue title for 8269907: memory leak - Dirty Nodes / Parent removed
Issue
- JDK-8269907: MemoryLeak: Scene.dirtyNodes and Parent.removed referencing removed Nodes when window is not showing (Bug - P4) ⚠️ Title mismatch between PR and JBS.
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/584/head:pull/584
$ git checkout pull/584
Update a local copy of the PR:
$ git checkout pull/584
$ git pull https://git.openjdk.org/jfx.git pull/584/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 584
View PR using the GUI difftool:
$ git pr show -t 584
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/584.diff
Webrev
:wave: Welcome back fkirmaier! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
Webrevs
- 09: Full - Incremental (e8e2f592)
- 08: Full - Incremental (dfb34701)
- 07: Full - Incremental (7e47e060)
- 06: Full (c5c5e080)
- 05: Full - Incremental (580ba7d3)
- 04: Full - Incremental (22326ccf)
- 03: Full - Incremental (29b90745)
- 02: Full - Incremental (28b793b5)
- 01: Full - Incremental (c603d2b1)
- 00: Full (56c2c8dc)
/reviewers 2
@kevinrushforth The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).
Ok, thank you for the feedback.
Is there a way, to do something with the renderlock on, with an similar API as Platform.runLater?
Then it would also be possible, to clear the dirtyNodes in the supposed way, without causing the minor performance issue of rerendering the whole Scene again. This would be a much cleaner approach.
During the past PRs the unit tests were quite unstable, but they seem to work.
I've reworked my fix. It now registers a pulseListener for the cleanup. The is also how i initially thought, the fix should look like. To do so, I've added the option to the Toolkit to register such listeners.
The fix no longer has the drawback, of rerendering the whole window, when it's showing property is changed.
The revised approach doesn't appear to have the same threading issues that the first attempt did (good). I would have thought you could add some logic to the existing scene pulse listener instead of creating a new one. I'll take a closer look when I review this.
It would be possible to add it to the existing PulseListener, but i think it would be more complicated. It might require some code to decide whether it should only clean the dirty children, or do the full pulse. The logic to add/remove the listener also would have to be more complicated.
Actually it was my first approach, but i didn't find a good place "to get started".
I don't have a good suggestion at the moment.
Converting this to Draft, since it isn't (yet) ready for review.
@kevinrushforth Can we rediscuss whether this version of the fix is ok?
I think the fix is technically fine. Adding a "runLater" like api to the PulseListener should also be fine, and might be useful for other problems too. The fix is also already used in production - for the latest JPro version and for desktop applications. Therefore I'm very confident, that this fix doesn't cause any problems.
This fix is also very important for large desktop applications. In context with Popups, the bug can cause unpredictable memory leaks.
I took a closer look, and the current PR has at least one problem. The call to synchronizeSceneNode
must only be done with the renderLock
held. If you are using this in production, then it likely means you aren't hitting the cases where this would break. I recommend either my earlier suggestion of hooking into the existing listener, or else you will need to acquire the renderLock
. If the latter, then one thing you will want to avoid is running both a regular sync pass and a cleanup pass in the same pulse. I see that you try to ensure that by only adding the listener if the scene is not showing, but since you might have to also check in the listener itself (not sure).
Thank you for your feedback! To solidify my understanding - the "RenderScenegraph" is only allowed to be changed when the render lock is held. This scene graph is represented by all these NG<Something> classes, correct?
I've now updated my PR. I found some code for snapshot, which solves the same problem about aquiring the render lock. I've moved it into an own method and used it for my fix.
The regular sync, and the new sync shouldn't happen in the same pulse - because the regular sync checks whether the Window is closed by if (peer != null)
. At least it shouldn't require the lock. If it would for some reason aquire the lock, then it should basically be a noop - so i don't really see why this would be an issue. Performance wise aquiring an additional lock can cause a very minor slowdown, but because it only happens once per closed window, it is realy minor.
To solidify my understanding - the "RenderScenegraph" is only allowed to be changed when the render lock is held. This scene graph is represented by all these NG classes, correct?
Correct. When the renderLock is not held, the renderer thread is free to render it and can be sure it won't change.
I'll take a look at the updated fix.
If found a bug in the current implementation and fixed it - which took quite long. It seems like, calling synchronizeSceneNodes without calling updateBounds() can cause that the NGNode can get out of sync. Which then caused rendering bugs with disappearing objects.
This is now fixed.
I've also added a call to the syncLights
and synchronizeSceneProperties
methods.
I think it's less complex to just sync everything, instead of just syncing a subset - also when making a snapshot.
Maybe it would make sense to add a check like the following, to the doUpdatePeer() method?
if(txBoundsInvalid) {
PlatformLogger logger = Logging.getInputLogger();
if (logger.isLoggable(Level.WARN)) {
logger.finer("txBoundsInvalid ot of sync on doUpdatePeer");
}
}
How to proceed to get this PR reviewed?
I've just merged it with master, to make it easier to review, if someone finds the time.
@FlorianKirmaier this pull request can not be integrated into master
due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout JDK-8269907-dirty-and-removed
git fetch https://git.openjdk.org/jfx master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
I've just merged with master again. It would be great if this could be reviewed at some time.
I took another good look at this solution, and I would like to offer an alternative as I think this solution is more dealing with symptoms of the underlying problem.
I think the underlying problem is:
-
Parent
is tracking aremoved
list, which is a list ofNode
. However, it only requires the peer for the dirty area calculation. This is a classic case of not being specific enough with the information you need. IfParent
is modified to storeNGNode
in itsremoved
list, then the problem ofremoved
keeping references to old nodes disappears asNGNode
does not refer back toNode
. Note: there is a test method currently inParent
that returns the list of removed nodes -- these tests would need to be modified to work with a list ofNGNode
or some other way should be found to check these cases. -
Scene
keeps tracking dirty nodes even when it is not visible. The list of dirty nodes could (before this fix) become as big as you want as it was never cleared as the pulse listener that synchronizes the nodes never runs whenpeer == null
-- keep adding and removing new nodes while the scene is not shown, and the list of dirty nodes grows indefinitely. This only happens if the scene has been shown at least once before, as before that time special code kicks in that tries to avoid keeping track of all scene nodes in the dirty list.
I think the better solution here would be to reset the scene to go back to its initial state (before it was shown) and sync all nodes when it becomes visible again. This can be achieved by setting the dirty node list to null
to trigger the logic that normally only triggers on the first show. addToDirtyList
should simply do nothing when peer == null
.
I believe the syncPeer
call is smart enough not to update the peer in the situation where a scene is hidden and then reshown, even if Scene
calls it again on all its nodes (syncPeer
in Node
will check dirtyBits.isEmpty()
).
I'd also highly recommend using ArrayList
instead of a manual Node[] dirtyNodes
in Scene
.
@hjohn
The Parent is tracking a removed list, which is a list of Node. However, it only requires the peer for the dirty area calculation. This is a classic case of not being specific enough with the information you need. If Parent is modified to store NGNode in its removed list, then the problem of removed keeping references to old nodes disappears as NGNode does not refer back to Node. Note: there is a test method currently in Parent that returns the list of removed nodes -- these tests would need to be modified to work with a list of NGNode or some other way should be found to check these cases.
This would just end in an leak of the NGNode, instead of the Node? There are also some leaks related to NGNode, bug guess I'll have to fix the more obvious bugs first. Also, NGNode sometimes refers to the Node. There are a lot of issues in this area.
Resetting the Scene
This approach has one problem - it would change the complexity of showing/hiding a window. This issue typically happened with Popups, which are regularly shown/hidden. I guess changing the complexity to O(
I don't want to change the dirtyNodes to ArrayList in this PR, i don't want to mix this kind of refactoring into the Bugfix-PR.
@hjohn
The Parent is tracking a removed list, which is a list of Node. However, it only requires the peer for the dirty area calculation. This is a classic case of not being specific enough with the information you need. If Parent is modified to store NGNode in its removed list, then the problem of removed keeping references to old nodes disappears as NGNode does not refer back to Node. Note: there is a test method currently in Parent that returns the list of removed nodes -- these tests would need to be modified to work with a list of NGNode or some other way should be found to check these cases.
This would just end in an leak of the NGNode, instead of the Node? There are also some leaks related to NGNode, bug guess I'll have to fix the more obvious bugs first. Also, NGNode sometimes refers to the Node. There are a lot of issues in this area.
Why not fix the root causes? This PR introduces several new things to fix a bug in a round about way that would not be needed at all if the root causes are fixed. I'm happy to help out here, as I prefer fixing the underlying problems (which probably solves multiple problems at once) rather than having to deal with each symptom and making things even more inscrutable in places where these symptoms appear.
If you know the root causes, then please explain them so they can be dealt with. I've only taken a short look, and I agree that NGNode
probably would be leaked instead. However, the end goal is to track how big the dirty area is. If keeping a list of nodes/ngnodes is bad, then how about keeping a rectangle instead and tracking the dirty area immediately? That would also solve problems related to NGNode pointing back to Node.
This would likely solve multiple problems, not just related to stages -- I suppose nodes can be detached and this removed list could cause problems then as well.
Resetting the Scene
This approach has one problem - it would change the complexity of showing/hiding a window. This issue typically happened with Popups, which are regularly shown/hidden. I guess changing the complexity to O() might be ok - but I'm not sure about it.
I think that the resulting code would be simpler, with less special cases (why should a stage that was shown before act differently from a fresh one). The O(n) loop to sync all peers should not be an issue when doing a relatively heavy action like showing a window as syncPeer
will do nothing for nodes that haven't become dirty.
Also, it has benefits as well. Changes to nodes on a shown then hidden stage would now be more performant, and would eliminate any performance differences between a fresh stage and one that was shown before.
Adding to the above, I couldn't find in what cases Node
might be referred from NGNode
. Where is this happening?
Further, NGNode
only has a parent pointer, not a full parent/child hierarchy like Parent
has. This would mean that if you track NGNode
in Parent#removed
a big part of the problem is alleviated as only a few parent NGNode
will be unable to be GC'd, versus entire UI trees when you refer to Node
.
I also did some more digging, and the only thing that the dirty region computation for removed nodes really requires is a RectBounds
. To do this it accesses only a couple of fields of NGNode
, no heavy calculations are involved in this part and so this could be done right away -- it's also possible that Node
may already have this information in some form (basically we need to know the last bounds of the node when it was rendered on screen -- saving a copy in the doUpdatePeer
call may be sufficient). When the dirty area computation for the parent is performed, you'd only need to apply the transforms and clipping logic that the parent provides on this RectBounds
-- no further fields of the removed NGNode
would need to be accessed.
@FlorianKirmaier This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@hjohn Sorry for the late reply.
Further, NGNode only has a parent pointer, not a full parent/child hierarchy like Parent has.
An NGNode is usually an NGGroup, which has its full tree referenced.
I guess it would be possible, to do the following changes:
- Always rerender a stage, if it's shown again - instead of only rendering the dirty area (and remove dirtyNodes when the stage is no longer shown)
- Remove the "removed" nodes from NGGroup. (either by refactoring and precalculating the dirty area, or just clearing it when a window is no longer shown)
This would change the O(n) complexity of showing/hiding a stage, but the rendering would be less complex. Imo, this would require more intrusive changes than this PR. I guess someone has to decide on it.
What I don't understand is, why this PR is considered to be complex. It just adds some kind of runLater to the Toolkit (I'm surprised, it didn't already exist) and calls it to clean up the dirtyNodes, when a window is no longer shown. I think that's quite straightforward.
@FlorianKirmaier This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@FlorianKirmaier This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open
pull request command.