Gaffer
Gaffer copied to clipboard
Performance testing of Gaffer
It would be really helpful to generate synthetic data for performance testing of Gaffer at a reasonable scale on a Gaffer/Accumulo cluster.
I've started looking at this issue from the perspective of the socialsensor Graph database comparison benchmarks.
I've started integrating the Gaffer Accumulo store into their benchmarking project so we can see a comparison against several other graph DB technologies (including Neo4J and Titan).
They use a variety of real and synthetic datasets and a number of graph-oriented benchmarks, most of which are easy to support, but some may require a bit more thought or some new analytical functionality.
Once I've made some progress I'll try to make this available in the gaffer-experimental project.
Steps
- Write install scripts for yarn/accumulo/gaffer
- Measure performance of adding data from hdfs (how many elements added per second?)
- Write install scripts for Gaffer REST API
- Measure performance of element retrieval from REST API
- Measure performance of streaming data into Gaffer
- Optimise
@gaffer01 it would be great to get some statistics included before we do version 1.0.0
See https://github.com/gchq/gaffer-tools/tree/develop/random-element-generation and https://github.com/gchq/gaffer-tools/tree/develop/performance-testing which may-or-may-not work or be relevant.
Duplicates #3027