fasten
fasten copied to clipboard
Caching artifact downloads for call graph construction
As described in #338, we can optimize the network traffic of the FASTEN pipeline by:
1- Downloads should be cached in a configurable folder.
2- The download logic should be refactored such that it can be mocked for the CallGraphConstructorTest
to avoid a download and make the test faster.
We could also try to start a basic web server in the test and download the package from localhost, to avoid introducing any actual internet traffic.
The important thing is that the refactoring will make it possible to cache packages, so we can eliminate downloads in other tests that make use of this class.
Is this issue still valid? Now that all Maven coordinates are downloaded to the .m2 folder, it should be straight forward to get access to .jars without downloading them again.
I can still look into this and extend the OPAL plugin to use the local JARs instead of downloading them again.
Can you do it, or do you do it?
Can you do it, or do you do it?
@proksch I will do it tomorrow.
One of us is missing the point of the issue.
From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL
accordingly to use a dir (i.e., the .m2
folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer
logic such that is uses the .m2
folder, so this could be the caching folder that is mentioned in the first post of this PR...
I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities
that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2
folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2
folder.
In particular, I am talking about the various downloadPom methods and httpGetToFile, the implementations are completely brute-force and completely disrespect the cacheable nature of Maven. Instead, these methods should first check whether the file can be found in the local .m2
repo and if not, should use either Shrinkwrap or the Maven library to resolve the pom/jar, because this will actually create the file in the .m2
folder. Once the file exists there, the methods can just return the file pointer.
Looking at the discussion that we had with @michelescarlato and @MagielBruntink, this would also be the right place to implement the download logic for source .jars. Instead of just having parameters for group, artifact, and version and having two methods, one for poms and one for jars, we could reduce this to a single method with a much more general interface, in which we also allow to define the packaging type (e.g., pom
, jar
, war
, ..) and the classifier (e.g., none, javadoc
, sources
, ...). The mechanism is then the same for all Maven artifacts and it would be one logic to rule them all.
One of us is missing the point of the issue.
From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of
OPAL
accordingly to use a dir (i.e., the.m2
folder), so we do not have downloads on every build. Since then, we have changed thePomAnalyzer
logic such that is uses the.m2
folder, so this could be the caching folder that is mentioned in the first post of this PR...I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the
MavenUtilities
that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the.m2
folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the.m2
folder.
Please let me clarify this.
Before this fix, OPALPlugin
downloads JAR files on the fly to generate a call graph. Now, OPALPlugin
does this:
- Checks if the JAR file of a given Maven coordinate exists in the
.m2
folder (created byPOMAnalyzer
), if yes, it uses the local JAR file to generate a CG. - If the JAR file doesn't exist in the
.m2
folder, the plugin usesMavenUtilties
to downland the JAR file from Maven central to generate a call graph. This is now a "fallback" method.
That said, I can extend MavenUtilties
to also store downloaded JARs in .m2
.
This case distinction is not a concern of the OPAL plugin, this is the task of the implementation logic of the downloader. I strongly disagree with handling this logic in the OPAL plugin.
Would it make sense to clarify this task in person, before you spend more time on it?
Sure, we can discuss this in person on Monday.
@proksch, also, if I understand correctly, POMAnalyzer
already resolves POMs, JARs, and sources for a given Maven coordinate and stores them in .m2
. So OPALPlugin
re-uses the JAR artifacts from .m2
if available. HOWEVER, it would be good to also extend MavenUtilties
to do the same.
The PomAnalyzer
only resolves .pom and .jar, checks if a source file exists at the expected location, and -if so- includes a URL for it. Part of the downloading logic is indeed implemented in the MavenRepositoryUtils that are part of the PomAnalyzer
... it would be a great idea to merge this class with the MavenUtilities
and provide a more general version. In contrast to the latter, the MavenRepositoryUtils
are thoroughly tested, these tests could be extended to cover the general case. Let's talk about this on Monday.
The Resolver class is also relevant here.