spark-dgraph-connector icon indicating copy to clipboard operation
spark-dgraph-connector copied to clipboard

A connector for Apache Spark and PySpark to Dgraph databases.

Results 52 spark-dgraph-connector issues
Sort by recently updated
recently updated
newest added

Similar to #72 support Dgraph Password type. Currently, this is a string, maybe a struct provides richer semantics. Definitively test with password example data.

enhancement

The response from Dgraph mighht exceed the gRPC maximum message size. Due to skewed data, some partitions might see larger results than others. When a gRPC exception occurs indicating the...

enhancement

Dgraph encode geo coordinates via GeoJSON standard. Currently, the connector represents this as a JSON string, but it should be a proper struct with `type` and `coordinates` fields. Depending on...

enhancement

Currently, gRPC maximum message size is hard set to 24 MiB at https://github.com/G-Research/spark-dgraph-connector/blob/78cb2214efe5c2cb6a5a27215d76275b6c3cec62/src/main/scala/uk/co/gresearch/spark/dgraph/connector/package.scala#L196 Make this number configurable via `spark.read.option`. Further, distinguish between the channel used in `SchemaProvider` and `DgraphExecutor`, and...

enhancement

Given a default language or a sequence of them would allow sources to pick only a single string value for multi-language predicates that have a `@lang` directive. Querying for `@lang1:lang2:.`...

enhancement

With #63 the connector supports reading all languages of a predicate. Filter pushdown (projection and selection) probably does not work for those multi-language predicates. Make it work for all implemented...

bug
enhancement

Given a predicate partitioning, we can further partition each partition orthogonally by uids. Some partitions may contain more rows than others. By splitting large partitions into more parts than smaller...

enhancement

The zero service provides predicate size statistics. A predicate partitioner could bin predicates by size, trying to achieve equal-size predicate partitions. This would be useful to partition long-tail schemata where...

enhancement

Spark DataSource V2 supports preferred location to co-locate the processing of partitions with the location of the data. Host names where data reside can be given to Spark via `org.apache.spark.sql.sources.v2.reader.InputPartition.preferredLocations`....

enhancement

GraphFrames does not like columns with dots (.), see issue #14. We are currently renaming columns and therefore make the GraphFrame schema deviate from all other sources. Get them fix...

enhancement