Gaffer
Gaffer copied to clipboard
Investigate removal of repositories other than maven central from POMs
It appears that all the dependencies required by Gaffer are available in Maven central, which is the default repository used by Maven. Although this may not have been the case in the past. When running builds Maven occasionally tries to check repos.spark-packages.org if it can't find a package in Maven central. This is often because of a mistake with the version.
It's unclear if this repository (see below - defined in spark library) is actually required or if it can be removed. On a clean installation of Maven with no preexisting dependencies downloaded, investigate to see if it can be removed without causing any missing dependencies.
<repositories>
<repository>
<id>Spark Packages</id>
<url>https://repos.spark-packages.org/</url>
</repository>
</repositories>
At least the module spark-library requires graphframes:graphframes which is not in Maven central. There doesn't appear to be a way to prevent maven from also trying to use this repository when looking for other dependencies.
Potentially the problem here is the Maven central repository being used as the fallback for spark modules due to being below the Spark repository in the repositories definitions. Further testing and looking at the Super-POM will answer this. If Maven central is also specified that may correct the order.
Running mvn help:effective-pom -Dverbose -pl :spark-library confirms that the way the spark-packages repo is specified causes it to take precedence over the default central repo:
<repositories>
<repository>
<id>Spark Packages</id> <!-- uk.gov.gchq.gaffer:spark:2.0.1-SNAPSHOT, line 35 -->
<url>https://repos.spark-packages.org/</url> <!-- uk.gov.gchq.gaffer:spark:2.0.1-SNAPSHOT, line 36 -->
</repository>
<repository>
<snapshots>
<enabled>false</enabled> <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 33 -->
</snapshots>
<id>central</id> <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 28 -->
<name>Central Repository</name> <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 29 -->
<url>https://repo.maven.apache.org/maven2</url> <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 30 -->
</repository>
</repositories>
As a result, Maven will check the spark repo ahead of central. See Maven docs for the priority used. When cloning the project for the first time this can cause significant delays while Maven tries to fetch from this repo, only falling back to fetching from central after timing out in some cases.
The PR to fix this adds central to the POM above spark-packages. This ensures it is only used as a fallback when the single package graphframes:graphframes is not found on Maven central.