age icon indicating copy to clipboard operation
age copied to clipboard

Performance issue

Open xephtar opened this issue 5 months ago • 2 comments

🔍 Performance Optimization for Multi-Hop Traversal in Apache AGE

Context:

We are currently using Apache AGE and have the following graph structure:

(:A)-[:HAS_Y]->(:Y)

(:A)-[:HAS_Z]->(:Z)

(:A)-[:HAS_D]->(:D)

(similar for 7 relation types and 8 node types total)

Our typical traversal pattern in Neo4j was:

MATCH (n:Y {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50
UNION ALL
MATCH (n:Z {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50

We expect:

~500 million nodes

3–4x that number in relationships

Question:

What kind of indexing strategy or query optimization would you recommend in Apache AGE for improving the performance of multi-hop traversal queries like [*..4]?

Any guidance or best practices for:

Node property indexing

Relationship indexing (e.g., start_id, end_id)

Traversal optimizations would be highly appreciated.

Current Setup:

We currently have:

Indexes on all relevant node properties

start_id and end_id indexes on all relationships

Sample test data:

~27 million vertices

~23 million edges

Query example:

SELECT d
FROM ag_catalog.cypher('user_unification', $$
    MATCH (n:Y) WHERE n.value = 'a0de44c7fc8cb783'
    MATCH (n)-[*..2]-(d:A)
    RETURN d
$$) as (d ag_catalog.agtype);

Execution time:

For [*..2]: ~30 seconds

For [*..4]: >150 seconds (often fails to complete)

Expected execution time: ≤10 ms for [*..2]

Any suggestions or feedback from the AGE team would be incredibly helpful. Thanks in advance!

xephtar avatar Jun 11 '25 10:06 xephtar

@xephtar Is there a reason that you need to match without direction -[]- ? That is, by far, one of the most resource intensive match patterns and basically negates the value of a directed graph. Additionally, do you need 2 matches?

jrgemignani avatar Jun 11 '25 15:06 jrgemignani

Our graph is directed and connected, with relationships like A -> X or A -> Y. When we perform a directed search starting from node X, we can't reach its second-level neighbors by jumping across relationships, since the traversal only follows the direction of the edges.

However, the reason we consider making the graph undirected is because there are nodes directly connected to X, but those nodes are also connected to other labels via outgoing relationships. In this case, due to the direction, we miss some relevant connections during traversal.

xephtar avatar Jun 12 '25 06:06 xephtar