spark-dgraph-connector issues

Support Dgraph Password type

Similar to #72 support Dgraph Password type. Currently, this is a string, maybe a struct provides richer semantics. Definitively test with password example data.

EnricoMi

enhancement

Auto-adjust chunk size when gRPC message size is too small

The response from Dgraph mighht exceed the gRPC maximum message size. Due to skewed data, some partitions might see larger results than others. When a gRPC exception occurs indicating the...

EnricoMi

enhancement

Support Dgraph Geo type

Dgraph encode geo coordinates via GeoJSON standard. Currently, the connector represents this as a JSON string, but it should be a proper struct with `type` and `coordinates` fields. Depending on...

EnricoMi

enhancement

Make gRCP max message size configurable

Currently, gRPC maximum message size is hard set to 24 MiB at https://github.com/G-Research/spark-dgraph-connector/blob/78cb2214efe5c2cb6a5a27215d76275b6c3cec62/src/main/scala/uk/co/gresearch/spark/dgraph/connector/package.scala#L196 Make this number configurable via `spark.read.option`. Further, distinguish between the channel used in `SchemaProvider` and `DgraphExecutor`, and...

EnricoMi

enhancement

Provide confguration of default language

Given a default language or a sequence of them would allow sources to pick only a single string value for multi-language predicates that have a `@lang` directive. Querying for `@lang1:lang2:.`...

EnricoMi

enhancement

Make multi-language predicates work with filter pushdown

With #63 the connector supports reading all languages of a predicate. Filter pushdown (projection and selection) probably does not work for those multi-language predicates. Make it work for all implemented...

EnricoMi

bug

enhancement

Predicate-size uid partitioning

Given a predicate partitioning, we can further partition each partition orthogonally by uids. Some partitions may contain more rows than others. By splitting large partitions into more parts than smaller...

EnricoMi

enhancement

Predicate-size predicate partitioning

The zero service provides predicate size statistics. A predicate partitioner could bin predicates by size, trying to achieve equal-size predicate partitions. This would be useful to partition long-tail schemata where...

EnricoMi

enhancement

Support data locality in Spark DataSource

Spark DataSource V2 supports preferred location to co-locate the processing of partitions with the location of the data. Host names where data reside can be given to Spark via `org.apache.spark.sql.sources.v2.reader.InputPartition.preferredLocations`....

EnricoMi

enhancement

Make GraphFrames work with columns with dots

1

GraphFrames does not like columns with dots (.), see issue #14. We are currently renaming columns and therefore make the GraphFrame schema deviate from all other sources. Get them fix...

EnricoMi

enhancement

spark-dgraph-connector
spark-dgraph-connector copied to clipboard

Metadata

Support Dgraph Password type

Auto-adjust chunk size when gRPC message size is too small

Support Dgraph Geo type

Make gRCP max message size configurable

Provide confguration of default language

Make multi-language predicates work with filter pushdown

Predicate-size uid partitioning

Predicate-size predicate partitioning

Support data locality in Spark DataSource

Make GraphFrames work with columns with dots

← Metadata

Owner

Metadata

spark-dgraph-connector spark-dgraph-connector copied to clipboard

Metadata

← Metadata

Owner

Metadata

spark-dgraph-connector
spark-dgraph-connector copied to clipboard