spark-dgraph-connector Support data locality in Spark DataSource

Support data locality in Spark DataSource

Open EnricoMi opened this issue 4 years ago • 0 comments

Spark DataSource V2 supports preferred location to co-locate the processing of partitions with the location of the data. Host names where data reside can be given to Spark via org.apache.spark.sql.sources.v2.reader.InputPartition.preferredLocations. If Spark nodes are co-located with dgraph instances, then reading predicates on Spark nodes that co-locate with the dgraph alphas storing those predicates should reduce network traffic and improve read performance.

Simply return the targets of each partition.

Sep 14 '20 07:09 EnricoMi

spark-dgraph-connector spark-dgraph-connector copied to clipboard

Support data locality in Spark DataSource

spark-dgraph-connector
spark-dgraph-connector copied to clipboard