maven
maven copied to clipboard
[MNG-7509] - Huge memory cost when parent pom widely used in a big project for dependencyManagement for maven-3.9.x
Following this checklist to help us incorporate your contribution quickly and easily:
- [x] Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
- [x] Each commit in the pull request should have a meaningful subject line and body.
- [x] Format the pull request title like
[MNG-XXX] SUMMARY
, where you replaceMNG-XXX
andSUMMARY
with the appropriate JIRA issue. Best practice is to use the JIRA issue title in the pull request title and in the first line of the commit message. - [x] Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
- [x] Run
mvn clean verify
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [x] You have run the Core IT successfully.
If your pull request is about ~20 lines of code you don't need to sign an Individual Contributor License Agreement if you are unsure please ask on the developers list.
To make clear that you license your contribution under the Apache License Version 2.0, January 2004 you have to acknowledge this by using the following check-box.
-
[x] I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004
-
[x] In any other case, please file an Apache Individual Contributor License Agreement.
Is this changeset not relevant for Maven 4?
mentioned
Currently, I just made this potential enhancement for branch 3.8.x and 3.9.x.
For maven 4, I have checked the code roughly, the classes in change list have been modified (seems that refine some iterate from for sentence to stream api). There should be the same issue in maven 4 as well, but the code change may be a little different since the code base are different.
So I just want to confirm current solution is OK first. Then I can apply similar change for maven 4.
I wonder if someone will give a green light to this PR, cause it fixes nothing: WeakHashMap is a "Hash table based implementation of the Map interface, with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use"
I wonder if someone will give a green light to this PR, cause it fixes nothing: WeakHashMap is a "Hash table based implementation of the Map interface, with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use"
You consider this change pointless?
@michael-o
You consider this change pointless?
The idea of maintaining pool of immutable objects is good, however the implementation proposed isn't: the JVM is free to cleanup dependencyCache
as soon as execution leaves #toDependency
method - there are no more references to depCacheKeykey
. The author is observing some performance improvements just because his JVM has enough free memory to not trigger GC whilst maven builds dependency graph (give JVM more memory to save memory).
The idea of maintaining pool of immutable objects is good, however the implementation proposed isn't: the JVM is free to cleanup dependencyCache as soon as execution leaves #toDependency method - there are no more references to depCacheKeykey. The author is observing some performance improvements just because his JVM has enough free memory to not trigger GC whilst maven builds dependency graph (give JVM more memory to save memory).
Hi @andreybpanfilov, glad to see your comments, I think there may be some misunderstanding for this change.
So first, I need to declare that the main change I want to make should be in ArtifactDescriptorReaderDelegate(maven-resolver-provider), but I noticed that there is same code in RepositoryUtils(maven-core), maybe for some historical reason, so I just made a same change here to keep them consistent.
For the toDependency method in RepositoryUtils only collect dependencyManagement in project pom, as you side, the change here doesn't bring too much increasement.(Entrance: getDependencies:243, LifecycleDependencyResolver (org.apache.maven.lifecycle.internal))
But for the convert method in ArtifactDescriptorReaderDelegate is totally different. It's used for calculating all the dependency relationship. It will travels all the dependencies in the project pom, and collect dependency manageMent for each dependency. Image one case, there are two dependencies in the project. And in each dependency pom, there are two dependencyManagement, so there should be 4 dependency instance will be created to save the information of the dependencyManagement. Image a case which is a little complicated, if the dependenyManagement in above two dependencise are one same parent pom, and there are 1000 dependencies managed in this parent pom. There would be 1000 * 2 = 2000 instance will be created for dependencyManagement, although there is only 1000 after remove duplicate ones. So for our real case in ebay project, there is about 3000 dependencies with a parent pom which managed all these 3000 dependencies version. So when building the project, there will be 3000 * 3000 = 9,000,000 instance will be created for dependencyManagement. But after removing duplicate, there are only 3000 are necessary. That's the change here why I use a hashmap the cache all the dependencyManagement instance.
And why I use a weakHashMap here is I want to make sure when the dependency instance are not refered by graph anymore it can be cleanup by GC, won't stay in memory since the reference from hashMap key.
And for the Memory cost you mentioned, I have also shared a build comparison for our project with command 'export MAVEN_OPTS="-Xms1g -Xmx1g”' to limit the memory in jira ticket. With the PR, it can build successfully, but with current maven code, it will throw OOM exception.
Please let me know if I should share some fake project to simulate this case to make you more clear
@Yougoss
I have read that sad story about 3000 deps, however the problem is how you are using weakhashmap, I'm thinking about something like: https://gist.github.com/andreybpanfilov/aac20d79b5648450dc6946cb438d8f41
Hi @andreybpanfilov , I think I get some of your point. You mean the way of WeakHashMap I used in this PR can't take effect when GC triggers But from my understanding, WeakHashMap means the key in the entry of it is WeakReference which means when gc triggers, the key will be clean up. As a result, the value( dependency ) won't be referred from the key after GC. When there is no other reference on this dependency instance, it can be clean up by GC too.
A further question, do you think this PR is a valid enhancement? For our enterprise code, there are a lot of libraries depends on each other, so we use a parent pom for centralized management and put it in each library. And in this way, we found huge memory cost when building those library and building the project depends on these libraries.
@andreybpanfilov , my intention for the WeakHashMap is not impact those dependency instances cleanup by GC. If I use a HashMap, when the dependency graph has been resolved, those dependency instances can't be cleanup by GC since the cache is a static field and own strong reference to dependency instances. But If I use WeakHashMap, if there is not enough memory and trigger GC, the key will be cleanup after GC since it's WeakReference, and then there is no strong reference to those dependency instances, so that they can be cleanup if there is no reference from dependency graph.
So the intention is, I don't want to make those dependency instances survive all the times after I use a static Map to cache them. Not get your point yet. Can you please share more details?
Seems that the memory enhancement has been already added into 3.9.x