spark-dgraph-connector issues

Improve reading large schema speed

Loading the dbpedia schema takes 30s. Investigate and try to improve performance. Related to #45.

enhancement

Make gRPC deadlines configurable

The Dgraph Java Client uses gRPC which has a concept called deadline for requests. This is a timeout that should be configurable and high enough for long running queries of...

EnricoMi

enhancement

Support TLS and HTTPS

Dgraph Java Client uses gRPC to communicate with the alpha nodes. It supports TLS and https. These should be supported by the connector as well. The connector also fetches some...

EnricoMi

enhancement

Add wide nodes source with edges and list properties

1

Similarly to the wide node source, a data source that supports edges and list properties could be useful. Edges are predicates that have a list of uids as the value....

EnricoMi

Sample partition's uids

Check with Dgraph devs if they could add an operation to GraphQL that provides you a sample of the uids that match a query. Retrieving only every `N`-th uid could...

EnricoMi

enhancement

Parsing JSON may return null

Parsing the JSON result may return null values where required JSON members are expected (e.g. see commit e35680d). Guard against this.

EnricoMi

bug

Wide node schema allows arbitrary column name injection

The wide node table schema uses predicate names as columns, allowing injection of arbitrary strings into column names. This should be reviewed and guarded against. For instance, a predicate `subject`...

EnricoMi

bug

Spark does not like columns with . (dot)

Selecting a column by name that contains a `.` (dot) confuses Spark: df.select($"dgraph.type") throws this exception: cannot resolve '`dgraph.type`' given input columns: [dgraph.graphql.schema, dgraph.type, ...] The reason is that `dgraph`...

EnricoMi

documentation

enhancement

Add performance data source

Add a data source that does not read the actual data but provides performance metrics. Each partition sends a query to the Dgraph cluster and retrieved besides the data also...

EnricoMi

enhancement

Add benchmarking tool

With the performance data source #10 we can measure various metrics on the partition level. Create some benchmarks running the following queries against a Dgraph cluster. The cluster needs to...

EnricoMi

enhancement

spark-dgraph-connector
spark-dgraph-connector copied to clipboard

Metadata

Improve reading large schema speed

Make gRPC deadlines configurable

Support TLS and HTTPS

Add wide nodes source with edges and list properties

Sample partition's uids

Parsing JSON may return null

Wide node schema allows arbitrary column name injection

Spark does not like columns with . (dot)

Add performance data source

Add benchmarking tool

← Metadata

Owner

Metadata

spark-dgraph-connector spark-dgraph-connector copied to clipboard

Metadata

← Metadata

Owner

Metadata

spark-dgraph-connector
spark-dgraph-connector copied to clipboard