Gaffer icon indicating copy to clipboard operation
Gaffer copied to clipboard

Investigate removal of repositories other than maven central from POMs

Open GCHQDeveloper314 opened this issue 3 years ago • 1 comments

It appears that all the dependencies required by Gaffer are available in Maven central, which is the default repository used by Maven. Although this may not have been the case in the past. When running builds Maven occasionally tries to check repos.spark-packages.org if it can't find a package in Maven central. This is often because of a mistake with the version.

It's unclear if this repository (see below - defined in spark library) is actually required or if it can be removed. On a clean installation of Maven with no preexisting dependencies downloaded, investigate to see if it can be removed without causing any missing dependencies.

<repositories>
   <repository>
      <id>Spark Packages</id>
      <url>https://repos.spark-packages.org/</url>
   </repository>
</repositories>

GCHQDeveloper314 avatar Jun 17 '22 14:06 GCHQDeveloper314

At least the module spark-library requires graphframes:graphframes which is not in Maven central. There doesn't appear to be a way to prevent maven from also trying to use this repository when looking for other dependencies.

Potentially the problem here is the Maven central repository being used as the fallback for spark modules due to being below the Spark repository in the repositories definitions. Further testing and looking at the Super-POM will answer this. If Maven central is also specified that may correct the order.

GCHQDeveloper314 avatar Jul 12 '22 14:07 GCHQDeveloper314

Running mvn help:effective-pom -Dverbose -pl :spark-library confirms that the way the spark-packages repo is specified causes it to take precedence over the default central repo:

<repositories>
    <repository>
      <id>Spark Packages</id>  <!-- uk.gov.gchq.gaffer:spark:2.0.1-SNAPSHOT, line 35 -->
      <url>https://repos.spark-packages.org/</url>  <!-- uk.gov.gchq.gaffer:spark:2.0.1-SNAPSHOT, line 36 -->
    </repository>
    <repository>
      <snapshots>
        <enabled>false</enabled>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 33 -->
      </snapshots>
      <id>central</id>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 28 -->
      <name>Central Repository</name>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 29 -->
      <url>https://repo.maven.apache.org/maven2</url>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 30 -->
    </repository>
  </repositories>

As a result, Maven will check the spark repo ahead of central. See Maven docs for the priority used. When cloning the project for the first time this can cause significant delays while Maven tries to fetch from this repo, only falling back to fetching from central after timing out in some cases.

The PR to fix this adds central to the POM above spark-packages. This ensures it is only used as a fallback when the single package graphframes:graphframes is not found on Maven central.

GCHQDeveloper314 avatar Jul 06 '23 15:07 GCHQDeveloper314