neo4j-spark-connector icon indicating copy to clipboard operation
neo4j-spark-connector copied to clipboard

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

Neo4j Connector for Apache Spark

This repository contains the Neo4j Connector for Apache Spark.

License

This neo4j-connector-apache-spark is Apache 2 Licensed

Generating Documentation from Source

cd doc
# Install NodeJS dependencies
npm install
# Generate HTML/CSS from asciidoc
./node_modules/.bin/antora docs.yml
# Start local server to browse docs
npm run start

This will open http://localhost:8000/ which will serve development docs.

Building

Building for Spark 2.4

You can build for Spark 2.4 with both Scala 2.11 and Scala 2.12

./maven-release.sh package 2.11 2.4
./maven-release.sh package 2.12 2.4

These commands will generate the corresponding targets

  • spark-2.4/target/neo4j-connector-apache-spark_2.11-<version>_for_spark_2.4.jar
  • spark-2.4/target/neo4j-connector-apache-spark_2.12-<version>_for_spark_2.4.jar

Building for Spark 3

You can build for Spark 2.4 with both Scala 2.12 and Scala 2.13

./maven-release.sh package 2.12 3
./maven-release.sh package 2.13 3

This will generate: These commands will generate the corresponding targets

  • spark-3/target/neo4j-connector-apache-spark_2.12-<version>_for_spark_3.jar
  • spark-3/target/neo4j-connector-apache-spark_2.13-<version>_for_spark_3.jar

Integration with Apache Spark Applications

spark-shell, pyspark, or spark-submit

$SPARK_HOME/bin/spark-shell --jars neo4j-connector-apache-spark_2.12-<version>_for_spark_3.jar

$SPARK_HOME/bin/spark-shell --packages org.neo4j:neo4j-connector-apache-spark_2.12:<version>_for_spark_3

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

scala spDependencies += "org.neo4j/neo4j-connector-apache-spark_2.11:<version>_for_spark_2.4"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "org.neo4j" % "neo4j-connector-apache-spark_2.12" % "<version>_for_spark_2.4"

Or, for Spark 3

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "org.neo4j" % "neo4j-connector-apache-spark_2.12" % "<version>_for_spark_3"

maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-connector-apache-spark_2.11</artifactId>
    <version>[version]_for_spark_2.4</version>
  </dependency>
</dependencies>

In case of Spark 3

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-connector-apache-spark_2.12</artifactId>
    <version>[version]_for_spark_3</version>
  </dependency>
</dependencies>

For more info about the available version visit https://neo4j.com/developer/spark/overview/#_compatibility