age
age copied to clipboard
Performance issue
🔍 Performance Optimization for Multi-Hop Traversal in Apache AGE
Context:
We are currently using Apache AGE and have the following graph structure:
(:A)-[:HAS_Y]->(:Y)
(:A)-[:HAS_Z]->(:Z)
(:A)-[:HAS_D]->(:D)
(similar for 7 relation types and 8 node types total)
Our typical traversal pattern in Neo4j was:
MATCH (n:Y {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50
UNION ALL
MATCH (n:Z {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50
We expect:
~500 million nodes
3–4x that number in relationships
Question:
What kind of indexing strategy or query optimization would you recommend in Apache AGE for improving the performance of multi-hop traversal queries like [*..4]?
Any guidance or best practices for:
Node property indexing
Relationship indexing (e.g., start_id, end_id)
Traversal optimizations would be highly appreciated.
Current Setup:
We currently have:
Indexes on all relevant node properties
start_id and end_id indexes on all relationships
Sample test data:
~27 million vertices
~23 million edges
Query example:
SELECT d
FROM ag_catalog.cypher('user_unification', $$
MATCH (n:Y) WHERE n.value = 'a0de44c7fc8cb783'
MATCH (n)-[*..2]-(d:A)
RETURN d
$$) as (d ag_catalog.agtype);
Execution time:
For [*..2]: ~30 seconds
For [*..4]: >150 seconds (often fails to complete)
Expected execution time: ≤10 ms for [*..2]
Any suggestions or feedback from the AGE team would be incredibly helpful. Thanks in advance!
@xephtar Is there a reason that you need to match without direction -[]- ? That is, by far, one of the most resource intensive match patterns and basically negates the value of a directed graph. Additionally, do you need 2 matches?
Our graph is directed and connected, with relationships like A -> X or A -> Y. When we perform a directed search starting from node X, we can't reach its second-level neighbors by jumping across relationships, since the traversal only follows the direction of the edges.
However, the reason we consider making the graph undirected is because there are nodes directly connected to X, but those nodes are also connected to other labels via outgoing relationships. In this case, due to the direction, we miss some relevant connections during traversal.