age Performance issue when upserting 10k+ edges using MERGE

Hi, I am trying to insert/update around 70K edges into the graph using age. I created the following function:

CREATE OR REPLACE FUNCTION create_update_edge(edge_list TEXT)
RETURNS void AS $$
DECLARE
  query TEXT;
BEGIN
  query := format($q$
    SELECT * FROM ag_catalog.cypher(
      'test_attribute_instance_graph',
      $cypher$
        UNWIND %s AS edge
        MERGE (a:attribute_instance {name: edge.source_name, value: edge.source_value})-[r:cooccurs_with]-(b:attribute_instance {name: edge.target_name, value: edge.target_value})
        SET r.count = edge.count
        SET r.first_seen = edge.first_seen
        SET r.last_seen = edge.last_seen
          
      $cypher$
    ) AS (ignored agtype);
  $q$, edge_list);

  EXECUTE query;

  RAISE NOTICE 'Edges merged and properties set.';
END;
$$ LANGUAGE plpgsql;

These are some of the points I had to take care of :

Function signature of ag_catalog.cyphe is (name, cstring, agtype)) So, I was not directly able to pass JSONB format as a parameter, and had to change the list of edges(each data is mapped to a dict) to a Cypher-compatible map(keys are unquoted) before passing it to the above function.
Trying to use MERGE..ON CREATE/ON MATCH SET gave syntax error, so I simply used MERGE SET to do the upsertion of edges.
Trying to B-tree indexing was a failed attempt, as again I was coming across a syntax error using the standard CREATE INDEX for B-tree command, so I used GIN Index.

Under this setup and batch unwind (batch size 1000 edges): I got the following results

Edges Inserted	Time (s)	Time (min)
1,000	0.62	0.01
2,000	16.35	0.27
3,000	22.42	0.37
4,000	69.63	1.16
5,000	104.75	1.75
6,000	177.12	2.95
7,000	447.66	7.46
8,000	598.38	9.97
9,000	913.82	15.23
10,000	1320.22	22.00

Clearly, the performance is pretty bad, so I was wondering if this is a limitation of age or a wrong implementation on my end. If someone could give me insights on my method and any other ways I should try inserting the data/or optimise the current method, it'd be of great help! Thankyou!!

May 20 '25 05:05 Yashmitha1

@Yashmitha1 One way you could improve this would be to use a directed edge (-[]-> or <-[]-). Currently, you are using a bi-directional edge (-[]-), which are notoriously slow. Maybe you could try 2 queries, one for each direction?

May 23 '25 15:05 jrgemignani

@Yashmitha1 Also could you share the execution plan?

May 23 '25 18:05 MuhammadTahaNaveed

Hi, doing something similar to this, we are using directed edges, and trying to create >8 million edges, and noticed this similar trend in performance, where after the approx 200k insertions, the performance starts to degrade very badly. We also tried to skip to that offset 200k and start our insertions from that offset, but the performance is still very poor. Is there any advice on this?

Aug 04 '25 08:08 ysvoon

@ysvoon Could you provide your query and execution plan?

Aug 04 '25 14:08 jrgemignani

Hi @jrgemignani, I've created a separate issue with the sample query and what we've tried and the query plan here https://github.com/apache/age/issues/2198

Aug 04 '25 14:08 ysvoon