spark-dgraph-connector Add benchmarking tool

Add benchmarking tool

Open EnricoMi opened this issue 4 years ago • 0 comments

With the performance data source #10 we can measure various metrics on the partition level. Create some benchmarks running the following queries against a Dgraph cluster. The cluster needs to be setup and populated with a large (possibly synthetic) graph by the benchmark tool to get comparable results and easy usage.

Partition Offset Performance

Read a SingletonPartion partitioned by uid into 100 partitions. The partitions will use different offsets to read the same number of udis. Compare the Dgraph metrics along the offsets / partition id.

Partition Size Performance

Read a SingletonPartion partitioned by uid into 128 partitions. Halve the number of partitions until 1 partition and compare metrics along the partition size. Should show us how reading a partition scales with the partition size (at a certain offset).

Filesystem Caches

All individual runs should be repeated N times (e.g. 6), where every k-th repetition (e.g. 2) should be preceded with a filesystem cache flush. This allows to measure the impact of filesystem caches on the performance.

Jun 15 '20 09:06 EnricoMi

spark-dgraph-connector spark-dgraph-connector copied to clipboard

Add benchmarking tool

Partition Offset Performance

Partition Size Performance

Filesystem Caches

spark-dgraph-connector
spark-dgraph-connector copied to clipboard