spark-dgraph-connector
spark-dgraph-connector copied to clipboard
Support data locality in Spark DataSource
Spark DataSource V2 supports preferred location to co-locate the processing of partitions with the location of the data. Host names where data reside can be given to Spark via org.apache.spark.sql.sources.v2.reader.InputPartition.preferredLocations
. If Spark nodes are co-located with dgraph instances, then reading predicates on Spark nodes that co-locate with the dgraph alphas storing those predicates should reduce network traffic and improve read performance.
Simply return the targets of each partition.