proofofconcept
proofofconcept copied to clipboard
test scalability of Neo4j using synthetic data
Given the intended property graph schema https://derivationmap.net/static/property_graph_schema.png, what is the performance as a function of graph size.
Tunable parameters:
- number of derivations
- number of steps per derivation
- number of expressions per step (e.g., 1 or 2 inputs, 1 or 2 outputs, 1 or 2 feeds)
- number of symbols per expression
- degree of interconnectedness of derivations
- number of symbols
- number of operators
If all properties are included (latex strings, Sympy text, note text, operator attributes, symbol attributes), then as a function of scale of number of expressions and symbols,
- how long does it take to add a new expression? (This depends on a bunch of simple queries)
- how long does it take to count the number of expressions? (A simple query)
- how long does it take to check the correctness of operators? (A moderately complicated query of properties)
This task depends on
- [ ] a synthetic data generator that approximates the data complexity of the PDG.
- [ ] a standardized set of queries relevant to the PDG