spark-dgraph-connector
spark-dgraph-connector copied to clipboard
Add benchmarking tool
With the performance data source #10 we can measure various metrics on the partition level. Create some benchmarks running the following queries against a Dgraph cluster. The cluster needs to be setup and populated with a large (possibly synthetic) graph by the benchmark tool to get comparable results and easy usage.
Partition Offset Performance
Read a SingletonPartion
partitioned by uid into 100 partitions. The partitions will use different offsets to read the same number of udis. Compare the Dgraph metrics along the offsets / partition id.
Partition Size Performance
Read a SingletonPartion
partitioned by uid into 128 partitions. Halve the number of partitions until 1 partition and compare metrics along the partition size. Should show us how reading a partition scales with the partition size (at a certain offset).
Filesystem Caches
All individual runs should be repeated N times (e.g. 6), where every k-th repetition (e.g. 2) should be preceded with a filesystem cache flush. This allows to measure the impact of filesystem caches on the performance.