webarchive-commons icon indicating copy to clipboard operation
webarchive-commons copied to clipboard

Pom file dependency for Hadoop ("compile"->"provided")

Open willp-bl opened this issue 11 years ago • 2 comments

The webarchive-commons pom file specifies a particular Hadoop version as a "compile" dependency, this should probably be "provided" so that jars are not duplicated as they will be on the cluster in any case.

Also - my cluster is CDH4 but the version in central relies on CDH3, not yet sure if this is what is causing me other issues

willp-bl avatar Apr 25 '14 15:04 willp-bl

Still not had time to look at this yet. Of course, in the meantime, anyone reliant on this artefact can exclude the Hadoop artefact dependency in their pom.xml, and add their own override.

<dependency>
  <groupId>org.netpreserve.commons</groupId>
  <artifactId>webarchive-commons</artifactId>
  <version>1.1.3</version>
  <exclusions>
    <exclusion>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>0.20.2-cdh3u4</version>
  <scope>provided</scope>
</dependency>

anjackson avatar Aug 20 '14 21:08 anjackson

The Hadoop dependency is needed for reading (w)arcs in HDFS. I can't find any other uses in webarchive-commons. An OpenWayback deployment with warcs stored in HDFS is then dependent on having these libraries included.

The easy solution is to change dependency to provided here and add hadoop-core as a dependency to OpenWayback. Not sure if that requires a major release or if the change is small enough for a minor release.

johnerikhalse avatar Apr 26 '16 13:04 johnerikhalse