age icon indicating copy to clipboard operation
age copied to clipboard

AGE python driver works slowly on large data

Open Munmud opened this issue 1 year ago • 5 comments

Describe the bug

AGE python driver works slowly on large data (72538 nodes and 72485 edges)

How are you accessing AGE (Command line, driver, etc.)?

  • PostgreSQL 12.15 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu 1~22.04.1) 11.3.0, 64-bit
  • AGE verion : pg12

What data setup do we need to do? Nothing

What is the necessary configuration info needed?

  • Nothing

What is the command that caused the error?

SELECT * 
FROM cypher('graph_name', $$
    MATCH (v)
    RETURN v
$$) as (v agtype);```

ERROR:

  • Works slowly on large data
  • AGE load data using csv used in regression test takes 2s. But getting all nodes and edges takes 9m+

**Expected behavior**
It shold takes within a few seconds
- I was able to get all data with a few twisted version of cypher query which takes 2s. But the cypher query should take similar time

Munmud avatar Jul 25 '23 14:07 Munmud

could you clarify what large data means in terms of size?

kapilt avatar Jul 26 '23 02:07 kapilt

could you clarify what large data means in terms of size?

@kapilt 72538 nodes and 72485 edges

Munmud avatar Jul 26 '23 11:07 Munmud

can you describe the data input that are you using?

M4rcxs avatar Jul 26 '23 16:07 M4rcxs

But getting all nodes and edges takes 9m+

It might be associated with the way you get the edges (the query you provided does not show the retrieval of the edges, but I assume you are trying to return them in some way). If it is something like the query below, it may take more time than usual.

SELECT * from cypher('graph', $$
        MATCH (V)-[R]-(V2)
        RETURN V,R,V2
$$) as (V agtype, R agtype, V2 agtype);

This is discussed in issue #628.

As answered there by John:

The regular MATCH uses nested JOINS to find the results whereas the VLE MATCH uses a graph pathing function. It is a different engine that is finding the matches in each case. The problem with the regular MATCH is that these JOINS can nest way to deep, depending on the graph. This is compounded by the labels being in separate tables.

So, maybe the query below may run faster with python:

SELECT * from cypher('graph', $$
        MATCH (V)-[R]->(V2)
        RETURN V,R,V2
$$) as (V agtype, R agtype, V2 agtype);

MatheusFarias03 avatar Sep 15 '23 19:09 MatheusFarias03

This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 11 '24 00:05 github-actions[bot]

This issue was closed because it has been stalled for further 7 days with no activity.

github-actions[bot] avatar May 19 '24 00:05 github-actions[bot]