fasten Caching artifact downloads for call graph construction

As described in #338, we can optimize the network traffic of the FASTEN pipeline by: 1- Downloads should be cached in a configurable folder. 2- The download logic should be refactored such that it can be mocked for the CallGraphConstructorTest to avoid a download and make the test faster.

Dec 13 '21 12:12 mir-am

We could also try to start a basic web server in the test and download the package from localhost, to avoid introducing any actual internet traffic.

The important thing is that the refactoring will make it possible to cache packages, so we can eliminate downloads in other tests that make use of this class.

Dec 13 '21 14:12 proksch

Is this issue still valid? Now that all Maven coordinates are downloaded to the .m2 folder, it should be straight forward to get access to .jars without downloading them again.

May 19 '22 21:05 proksch

I can still look into this and extend the OPAL plugin to use the local JARs instead of downloading them again.

May 20 '22 07:05 mir-am

Can you do it, or do you do it?

May 20 '22 22:05 proksch

Can you do it, or do you do it?

@proksch I will do it tomorrow.

May 20 '22 22:05 mir-am

One of us is missing the point of the issue.

From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL accordingly to use a dir (i.e., the .m2 folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer logic such that is uses the .m2 folder, so this could be the caching folder that is mentioned in the first post of this PR...

I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2 folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2 folder.

May 21 '22 19:05 proksch

In particular, I am talking about the various downloadPom methods and httpGetToFile, the implementations are completely brute-force and completely disrespect the cacheable nature of Maven. Instead, these methods should first check whether the file can be found in the local .m2 repo and if not, should use either Shrinkwrap or the Maven library to resolve the pom/jar, because this will actually create the file in the .m2 folder. Once the file exists there, the methods can just return the file pointer.

Looking at the discussion that we had with @michelescarlato and @MagielBruntink, this would also be the right place to implement the download logic for source .jars. Instead of just having parameters for group, artifact, and version and having two methods, one for poms and one for jars, we could reduce this to a single method with a much more general interface, in which we also allow to define the packaging type (e.g., pom, jar, war, ..) and the classifier (e.g., none, javadoc, sources, ...). The mechanism is then the same for all Maven artifacts and it would be one logic to rule them all.

May 21 '22 19:05 proksch

One of us is missing the point of the issue.

From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL accordingly to use a dir (i.e., the .m2 folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer logic such that is uses the .m2 folder, so this could be the caching folder that is mentioned in the first post of this PR...

I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2 folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2 folder.

Please let me clarify this. Before this fix, OPALPlugin downloads JAR files on the fly to generate a call graph. Now, OPALPlugin does this:

Checks if the JAR file of a given Maven coordinate exists in the .m2 folder (created by POMAnalyzer), if yes, it uses the local JAR file to generate a CG.
If the JAR file doesn't exist in the .m2 folder, the plugin uses MavenUtilties to downland the JAR file from Maven central to generate a call graph. This is now a "fallback" method.

That said, I can extend MavenUtilties to also store downloaded JARs in .m2.

May 21 '22 19:05 mir-am

This case distinction is not a concern of the OPAL plugin, this is the task of the implementation logic of the downloader. I strongly disagree with handling this logic in the OPAL plugin.

Would it make sense to clarify this task in person, before you spend more time on it?

May 21 '22 19:05 proksch

Sure, we can discuss this in person on Monday.

@proksch, also, if I understand correctly, POMAnalyzer already resolves POMs, JARs, and sources for a given Maven coordinate and stores them in .m2. So OPALPlugin re-uses the JAR artifacts from .m2 if available. HOWEVER, it would be good to also extend MavenUtilties to do the same.

May 21 '22 19:05 mir-am

The PomAnalyzer only resolves .pom and .jar, checks if a source file exists at the expected location, and -if so- includes a URL for it. Part of the downloading logic is indeed implemented in the MavenRepositoryUtils that are part of the PomAnalyzer... it would be a great idea to merge this class with the MavenUtilities and provide a more general version. In contrast to the latter, the MavenRepositoryUtils are thoroughly tested, these tests could be extended to cover the general case. Let's talk about this on Monday.

May 21 '22 20:05 proksch

The Resolver class is also relevant here.

May 21 '22 20:05 proksch

fasten fasten copied to clipboard

Caching artifact downloads for call graph construction

fasten
fasten copied to clipboard