age icon indicating copy to clipboard operation
age copied to clipboard

What is the best practices for creating vertices and edges in AGE to avoid duplicates and maximize performance?

Open MatheusFarias03 opened this issue 1 year ago • 5 comments

Hi folks! I've been working on a Python project that collects 3851 vertices and creates 12507 edges. I want to store this data in AGE, but I'm not quite sure about the best approach.

The data collection is performed using Python's Beautiful Soup, a web scraping library. I then create objects representing vertices, edges, and the graph. Vertices and edges are stored in their respective arrays within the graph class.

For creating vertices in AGE, I use the MERGE clause to check if a vertex with the same label and properties already exists in the graph before creating it:

query = f'''
SELECT * FROM cypher('{graph_name}', $$ 
MERGE (:{vertex.label} {properties}) 
$$) AS (n agtype); 
'''

This works well for creating each vertex independently.

However, during testing with AGE, I noticed that when vertices are created this way and the following query is executed:

query=f'''
SELECT * FROM cypher('{graph_name}', $$
MERGE (a:{from_v_label} {from_v_properties})-[e:{e_label}]->(b:{to_v_label} {to_v_properties})
$$) AS (e agtype);
'''

If the vertices already exist, it creates duplicates of the vertices.

So, my question is, what is the best way to create vertices and edges without duplicating vertices that may already exist and with good performance? Thank you for taking the time to read my question.

MatheusFarias03 avatar Jan 23 '24 01:01 MatheusFarias03

If all vertices already exist before any edges, can't we just use-

MATCH (vertex1) MATCH (vertex2) CREATE (vertex1)-[:e]->(vertex2)

rafsun42 avatar Jan 23 '24 18:01 rafsun42

Hi Rafsun! The vertices can be created before and then create the edges after. I thought about this solution before, but wouldn't this be O(n) for creating the vertices and then O(2n) to check if the vertices exist to just then create the edge? I've been thinking that this would take quite a while if we consider the total complexity of this operation as O(n + 2n × m) - considering n to be the number of vertices and m the number of edges. I'm also considering that AGE doesn't use indexes on vertices and edges tables.

MatheusFarias03 avatar Jan 23 '24 20:01 MatheusFarias03

Hi @MatheusFarias03 ,

Yes. That makes sense. As for 'duplicate vertices', does the same query also create duplicates in Neo4J? Is it an expected behavior?

rafsun42 avatar Jan 29 '24 17:01 rafsun42

I've checked the same queries with Neo4J and it works the same way. Vertices are duplicated also. It is an expected behavior I guess.

MatheusFarias03 avatar Feb 03 '24 14:02 MatheusFarias03

@MatheusFarias03

Do you think the following would work? It should not duplicate the vertices.

MERGE (n:Person {name:'abc'})  -- creates the start vertex if does not exist
MERGE (m:Person {name:'xyz'}) -- creates the end vertex if does not exist
MERGE (n)-[:rel]->(m) -- creates the edge if does not exist
RETURN n

rafsun42 avatar Feb 08 '24 01:02 rafsun42

This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 11 '24 00:05 github-actions[bot]

This issue was closed because it has been stalled for further 7 days with no activity.

github-actions[bot] avatar May 19 '24 00:05 github-actions[bot]