spark-dgraph-connector icon indicating copy to clipboard operation
spark-dgraph-connector copied to clipboard

A connector for Apache Spark and PySpark to Dgraph databases.

Results 52 spark-dgraph-connector issues
Sort by recently updated
recently updated
newest added

Loading the dbpedia schema takes 30s. Investigate and try to improve performance. Related to #45.

enhancement

The Dgraph Java Client uses gRPC which has a concept called deadline for requests. This is a timeout that should be configurable and high enough for long running queries of...

enhancement

Dgraph Java Client uses gRPC to communicate with the alpha nodes. It supports TLS and https. These should be supported by the connector as well. The connector also fetches some...

enhancement

Similarly to the wide node source, a data source that supports edges and list properties could be useful. Edges are predicates that have a list of uids as the value....

Check with Dgraph devs if they could add an operation to GraphQL that provides you a sample of the uids that match a query. Retrieving only every `N`-th uid could...

enhancement

Parsing the JSON result may return null values where required JSON members are expected (e.g. see commit e35680d). Guard against this.

bug

The wide node table schema uses predicate names as columns, allowing injection of arbitrary strings into column names. This should be reviewed and guarded against. For instance, a predicate `subject`...

bug

Selecting a column by name that contains a `.` (dot) confuses Spark: df.select($"dgraph.type") throws this exception: cannot resolve '`dgraph.type`' given input columns: [dgraph.graphql.schema, dgraph.type, ...] The reason is that `dgraph`...

documentation
enhancement

Add a data source that does not read the actual data but provides performance metrics. Each partition sends a query to the Dgraph cluster and retrieved besides the data also...

enhancement

With the performance data source #10 we can measure various metrics on the partition level. Create some benchmarks running the following queries against a Dgraph cluster. The cluster needs to...

enhancement