proofofconcept icon indicating copy to clipboard operation
proofofconcept copied to clipboard

test scalability of Neo4j using synthetic data

Open bhpayne opened this issue 2 years ago • 0 comments

Given the intended property graph schema https://derivationmap.net/static/property_graph_schema.png, what is the performance as a function of graph size.

Tunable parameters:

  • number of derivations
  • number of steps per derivation
  • number of expressions per step (e.g., 1 or 2 inputs, 1 or 2 outputs, 1 or 2 feeds)
  • number of symbols per expression
  • degree of interconnectedness of derivations
  • number of symbols
  • number of operators

If all properties are included (latex strings, Sympy text, note text, operator attributes, symbol attributes), then as a function of scale of number of expressions and symbols,

  • how long does it take to add a new expression? (This depends on a bunch of simple queries)
  • how long does it take to count the number of expressions? (A simple query)
  • how long does it take to check the correctness of operators? (A moderately complicated query of properties)

This task depends on

  • [ ] a synthetic data generator that approximates the data complexity of the PDG.
  • [ ] a standardized set of queries relevant to the PDG

bhpayne avatar Mar 27 '22 17:03 bhpayne