age icon indicating copy to clipboard operation
age copied to clipboard

Sorted queries do not utilise indices, when such are introduced

Open maximdvoynishnikov opened this issue 1 year ago • 9 comments

We have been experimenting recently with Apache AGE specifically with the main focus on performance in comparison with the Postgres DB.

Our very basic data model contains single vertex label AccessPoint. For the simplicity sake the only relevant property on AccessPoint label is the id value, which is also supposed to be frequently used by the application queries.

The dataset contains 50_013 AccessPoint vertexes.

We introduced several indices to see, which are actually being used by the queries of different type. For the completeness here are the indices we introduced on the AccessPoint table:

CREATE UNIQUE INDEX ap_id ON "test-graph"."AccessPoint" USING btree (agtype_access_operator(properties, '"id"'));
CREATE INDEX ap_gin_properties ON "test-graph"."AccessPoint" USING GIN(properties);
CREATE UNIQUE INDEX ap_v_id ON "test-graph"."AccessPoint"(id);

To sum up, we have BTREE indices on the properties -> id and on the vertex id, and we have a GIN index on properties.

The problem we have observed is the lack of utilisation of any indices, for a simple query like:

SELECT * FROM cypher(
  'test-graph', $$
    MATCH (ap:AccessPoint)
    WITH ap
    ORDER BY ap.id
    RETURN ap
  $$
) as (value agtype)

The query plan for the query above is:

Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=5631.49..9308.86 rows=29419 width=32) (actual time=1703.577..2113.010 rows=50013 loops=1)
  ->  Gather Merge  (cost=5631.49..9014.67 rows=29419 width=64) (actual time=1703.567..2085.553 rows=50013 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Sort  (cost=4631.48..4705.03 rows=29419 width=64) (actual time=1533.703..1700.085 rows=25006 loops=2)
              Sort Key: (agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(ap.id, _label_name('20505'::oid, ap.id), ap.properties), '""id""'::agtype]))"
              Sort Method: external merge  Disk: 10848kB
              Worker 0:  Sort Method: external merge  Disk: 8512kB
              ->  Parallel Seq Scan on ""AccessPoint"" ap  (cost=0.00..2447.93 rows=29419 width=64) (actual time=2.513..560.890 rows=25006 loops=2)"
Planning Time: 7.458 ms
Execution Time: 2128.705 ms

For the comparison a standard SQL query:

SELECT * 
FROM "test-graph"."AccessPoint" 
ORDER BY agtype_access_operator(properties, '"id"');

utilizes the ap_id btree index on properties -> id, the query plan:

Index Scan using ap_id on "AccessPoint"  (cost=0.41..9687.64 rows=50013 width=288) (actual time=0.610..347.862 rows=50013 loops=1)
Planning Time: 1.683 ms
Execution Time: 364.441 ms

maximdvoynishnikov avatar Jan 25 '24 13:01 maximdvoynishnikov

@alteck45 What version of PostgreSQL and version of AGE are you running this on?

Additionally, these 2 issues covered a lot about indexes, albeit with WHERE, that might be of assistance.

#1235 #1000

jrgemignani avatar Jan 25 '24 18:01 jrgemignani

@jrgemignani PostgreSQL version 16.1 and AGE version 1.5.0

maximdvoynishnikov avatar Jan 26 '24 08:01 maximdvoynishnikov

@alteck45 Could you provide a simple sample set?

jrgemignani avatar Jan 27 '24 01:01 jrgemignani

@alteck45 Could you provide a simple sample set so that we can try to reproduce this?

jrgemignani avatar Feb 21 '24 22:02 jrgemignani

@alteck45 This index won't be used -

CREATE UNIQUE INDEX ap_id ON "test-graph"."AccessPoint" USING btree (agtype_access_operator(properties, '"id"'));

due to the query not matching agtype_access_operator(properties, '"id"') -

(agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(ap.id, _label_name('20505'::oid, ap.id), ap.properties), '""id""'::agtype]))

jrgemignani avatar Feb 22 '24 01:02 jrgemignani

@alteck45 I'm also a bit confused by ap.id what id are you referring to or expecting? An id that you set in the properties or the id of the vertex that is generated when the vertex is created?

jrgemignani avatar Feb 22 '24 01:02 jrgemignani

@alteck45 I'm not sure why it is expanding the query out with ap.id and ap.properties. I have a feeling this is not allowing some of the indexes to work. I will look at the GIN indexes next.

psql-16.1-5432-pgsql=# SELECT * FROM cypher(
  'test-graph', $$ EXPLAIN
    MATCH (ap:AccessPoint)
    RETURN ap ORDER BY id(ap)
  $$) as (value agtype);
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Subquery Scan on _  (cost=1.49..1.64 rows=12 width=32)
   ->  Sort  (cost=1.49..1.52 rows=12 width=64)
         Sort Key: (age_id(_agtype_build_vertex(ap.id, _label_name('16979'::oid, ap.id), ap.properties)))
         ->  Seq Scan on "AccessPoint" ap  (cost=0.00..1.27 rows=12 width=64)
(4 rows)

psql-16.1-5432-pgsql=#

jrgemignani avatar Feb 22 '24 21:02 jrgemignani

--Cypher query: Retriev all AccessPoint vertices ordered by id, utilizing BTREE index on vertex id-----

SELECT * FROM cypher( 'test-graph', $$ MATCH (ap:AccessPoint) WITH ap ORDER BY id(ap) RETURN ap $$ ) as (value agtype);

sir-sa avatar Apr 03 '24 05:04 sir-sa

This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 19 '24 00:05 github-actions[bot]

This issue was closed because it has been stalled for further 14 days with no activity.

github-actions[bot] avatar Jun 02 '24 00:06 github-actions[bot]