agensgraph icon indicating copy to clipboard operation
agensgraph copied to clipboard

Performance reduced when joining anonymous vertices

Open jshen-r7 opened this issue 2 years ago • 1 comments

This is a side effect of #599 - we've noticed that performance on queries with anonymous vertices is noticeably slower. We explicitly leave out the vertex name to avoid unnecessary data fetches.

Here are the query plans for a simple graph on AG 2.13.0:

staging=# create graph test;
CREATE GRAPH
staging=# set graph_path to test;
SET
staging=# create vlabel foo;
CREATE VLABEL
staging=# create vlabel bar;
CREATE VLABEL
staging=# create elabel blah;
CREATE ELABEL
staging=# create  (f:foo {id: 1}) create (b:bar {id: 2}) create (f)-[:blah]->(b);
UPDATE 3
staging=#  explain analyze match (f:foo)-[e]->() return f.id, id(e);
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=37.00..64.12 rows=971 width=40) (actual time=0.033..0.035 rows=1 loops=1)
   Hash Cond: (e.start = f.id)
   ->  Append  (cost=0.00..24.56 rows=971 width=16) (actual time=0.007..0.008 rows=1 loops=1)
         ->  Seq Scan on ag_edge e_1  (cost=0.00..0.00 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=1)
         ->  Seq Scan on blah e_2  (cost=0.00..19.70 rows=970 width=16) (actual time=0.004..0.004 rows=1 loops=1)
   ->  Hash  (cost=22.00..22.00 rows=1200 width=40) (actual time=0.008..0.009 rows=1 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on foo f  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.002..0.003 rows=1 loops=1)
 Planning Time: 0.431 ms
 Execution Time: 0.052 ms
(10 rows)

Now this is the same query plan on AG 2.13.1:

staging=#  explain analyze match (f:foo)-[e]->() return f.id, id(e);

                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join  (cost=37.74..381.05 rows=11657 width=40) (actual time=0.065..0.068 rows=1 loops=1)
   Hash Cond: (e.start = f.id)
   ->  Merge Join  (cost=0.74..313.34 rows=11657 width=16) (actual time=0.048..0.051 rows=1 loops=1)
         Merge Cond: (e."end" = "<0000000001>".id)
         ->  Merge Append  (cost=0.29..39.96 rows=971 width=24) (actual time=0.013..0.014 rows=1 loops=1)
               Sort Key: e."end", e.start
               ->  Index Scan using ag_edge_end_idx on ag_edge e_1  (cost=0.12..2.34 rows=1 width=24) (actual time=0.003..0.003 rows=0 loops=1)
               ->  Index Scan using blah_end_idx on blah e_2  (cost=0.15..27.90 rows=970 width=24) (actual time=0.010..0.010 rows=1 loops=1)
         ->  Materialize  (cost=0.45..102.10 rows=2401 width=8) (actual time=0.031..0.033 rows=2 loops=1)
               ->  Merge Append  (cost=0.45..96.10 rows=2401 width=8) (actual time=0.027..0.029 rows=2 loops=1)
                     Sort Key: "<0000000001>".id
                     ->  Index Only Scan using ag_vertex_pkey on ag_vertex "<0000000001>_1"  (cost=0.12..2.34 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1)
                           Heap Fetches: 0
                     ->  Index Only Scan using foo_pkey on foo "<0000000001>_2"  (cost=0.15..31.35 rows=1200 width=8) (actual time=0.016..0.017 rows=1 loops=1)
                           Heap Fetches: 1
                     ->  Index Only Scan using bar_pkey on bar "<0000000001>_3"  (cost=0.15..31.35 rows=1200 width=8) (actual time=0.007..0.007 rows=1 loops=1)
                           Heap Fetches: 1
   ->  Hash  (cost=22.00..22.00 rows=1200 width=40) (actual time=0.006..0.006 rows=1 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on foo f  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.004..0.004 rows=1 loops=1)
Planning Time: 0.612 ms
Execution Time: 0.144 ms
(22 rows)

This costs an addition merge join, as well as a scan over every VLABEL table, which is a major issue for the performance of our queries that only require one vertex on an edge.

jshen-r7 avatar Dec 28 '23 18:12 jshen-r7

For reference, on a similar query the performance difference is 80 ms vs 5.3 seconds. This is for a modest result set with ~5k rows, and we have ~800 VLABELs total.

jshen-r7 avatar Dec 28 '23 19:12 jshen-r7