age icon indicating copy to clipboard operation
age copied to clipboard

Gin Indexing Problem

Open arca1n opened this issue 1 year ago • 8 comments

Describe the bug GIN indexes on Specific properties does not seem to work. This is basing off of the issue #954

How are you accessing AGE (Command line, driver, etc.)?

  • [e.g. JDBC]

What data setup do we need to do?

...
SELECT create_graph('cypher_index');
SELECT create_vlabel('cypher_index', 'Movie');
CREATE INDEX imdb ON cypher_index."Movie" USING gin ((properties->'imdbRank'));
SELECT * from cypher('cypher_index', $$
CREATE
(:Movie {id: 'movie1', name: 'The Shawshank Redemption', imdbRank : 25}),
(:Movie {id: 'movie2', name: 'The Godfather', imdbRank : 60}),
(:Movie {id: 'movie3', name: 'The Dark Knight', imdbRank : 100})
$$) as (V agtype);
...
-- Explain plans are not using the index
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie) where n.imdbRank = 25 return n  $$) AS (a agtype);

Result of the above EXPLAIN query

"Seq Scan on ""Movie"" n  (cost=0.00..34.03 rows=6 width=32) (actual time=0.019..0.025 rows=1 loops=1)"
"  Filter: (agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(id, _label_name('91013'::oid, id), properties), '""imdbRank""'::agtype]) = '25'::agtype)"
  Rows Removed by Filter: 2
Planning Time: 0.058 ms
Execution Time: 0.037 ms

Another EXPLAIN query using the MATCH by property


SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie {imdbRank: 25}) return n  $$) AS (a agtype);

Result from the above plan

"Seq Scan on ""Movie"" n  (cost=0.00..28.00 rows=1 width=32) (actual time=0.016..0.018 rows=1 loops=1)"
"  Filter: (properties @> agtype_build_map('imdbRank'::text, '25'::agtype))"
  Rows Removed by Filter: 2
Planning Time: 0.051 ms
Execution Time: 0.031 ms

What is the necessary configuration info needed? Just pg13 with the latest release of apache age

What is the command that caused the error?

SELECT * from cypher('my_graph_name', $$
  MATCH (a:Part {part_num: '123'}), (b:Part {part_num: '345'})
  CREATE (a)-[u:used_by { quantity: 1 }]->(b)
$$) as (a agtype);
ERROR:  something failed to execute

Expected behavior The query should be hitting the GIN index

Environment (please complete the following information):

  • Version: Postgres13 with the latest tag of Apache AGE compiled and added.

Additional context Based on the discussions reported in other issues posted earlier indexing seems to be a bit of a riddle. I have been able to add uniqueness based on a unique index on a property in the json be example, but that index doesn't get used when running match queries.

arca1n avatar Jun 26 '23 04:06 arca1n

This might be a duplicate of https://github.com/apache/age/issues/1000

ddrao avatar Jun 26 '23 17:06 ddrao

@MuhammadTahaNaveed @Zainab-Saad as they might be working on a related patch mentioned in #1000

jrgemignani avatar Jun 27 '23 23:06 jrgemignani

There are indeed multiple issues with indexing that you might be experiencing:

  • gin indices not used in WHERE clauses -> #1000
  • specific indices not being usedin MATCH and WHERE clauses -> https://github.com/apache/age/pull/212#issuecomment-1117999045?
  • unique indices not being used in MATCH or WHERE clauses

[edit] I just realized there is another indexing issue that maybe should be added here (but which I haven't encountered myself):

  • #1010

pdpotter avatar Jun 29 '23 13:06 pdpotter

@pdpotter Thank you for pulling these altogether in a nice concise list. I will have our team look at working on each of these items. @MuhammadTahaNaveed @Zainab-Saad @CapnSpek @dehowef Let us know if you might be interested in any of these :)

jrgemignani avatar Jun 29 '23 16:06 jrgemignani

@arca1n I believe this issue has been resolved in the master branch. Would you be able to verify it on your end?

jrgemignani avatar Oct 21 '23 00:10 jrgemignani

I tried the latest code in the master branch. Unfortunately, indexes on specific property indexes still aren't used in my reproduction (both on a numeric property as on a textual property).

Reproduction:

  • Run latest master branch code in docker container
git clone [email protected]:apache/age.git
cd age
docker build -f docker/Dockerfile . -t age_index
docker run -e POSTGRES_USER=age_index -e POSTGRES_PASSWORD=age_index -e POSTGRES_DB=age_index -p 5413:5432 age_index
  • Connect with this server (other shell)
psql -h 127.0.0.1 -p 5413 -U age_index -d age_index
  • Try to get specific indexes working
load 'age';
# LOAD
SET search_path = ag_catalog, "$user", public;
# SET
SELECT create_graph('cypher_index');
# NOTICE:  graph "cypher_index" has been created
# create_graph 
#--------------
# 
#(1 row)
#
SELECT create_vlabel('cypher_index', 'Movie');
#NOTICE:  VLabel "Movie" has been created
# create_vlabel 
#---------------
# 
#(1 row)
#
CREATE INDEX id ON cypher_index."Movie" USING gin ((properties->'id'::text));
#CREATE INDEX
CREATE INDEX imdb ON cypher_index."Movie" USING gin ((properties->'imdb'::text));
#CREATE INDEX
SELECT * from cypher('cypher_index', $$
CREATE
(:Movie {id: 1, name: 'The Shawshank Redemption', imdb : 'tt0111161'}),
(:Movie {id: 2, name: 'The Godfather', imdb : 'tt0068646'}),
(:Movie {id: 3, name: 'The Dark Knight', imdb : 'tt0468569'})
$$) as (V agtype);
# v 
#---
#(0 rows)
#
SET enable_seqscan = false;
# SET
SELECT * FROM cypher('cypher_index', $$ MATCH(n:Movie {id: 1}) return n  $$) AS (a agtype);
#                                                                  a                                                                  
#-------------------------------------------------------------------------------------------------------------------------------------
# {"id": 844424930131969, "label": "Movie", "properties": {"id": 1, "imdb": "tt0111161", "name": "The Shawshank Redemption"}}::vertex
#(1 row)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie {id: 1}) return n  $$) AS (a agtype);
#                                                        QUERY PLAN                                                        
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n  (cost=10000000000.00..10000000025.00 rows=1 width=32) (actual time=49.585..49.589 rows=1 loops=1)
#   Filter: (properties @> '{"id": 1}'::agtype)
#   Rows Removed by Filter: 2
# Planning Time: 0.123 ms
# JIT:
#   Functions: 4
#   Options: Inlining true, Optimization true, Expressions true, Deforming true
#   Timing: Generation 0.507 ms, Inlining 17.602 ms, Optimization 27.453 ms, Emission 4.498 ms, Total 50.060 ms
# Execution Time: 50.140 ms
#(9 rows)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie) where n.id = 1 return n  $$) AS (a agtype);
#QUERY PLAN                                                        
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n  (cost=10000000000.00..10000000028.03 rows=6 width=32) (actual time=47.709..47.713 rows=1 loops=1)
#   Filter: (agtype_access_operator(VARIADIC ARRAY[properties, '"id"'::agtype]) = '1'::agtype)
#   Rows Removed by Filter: 2
# Planning Time: 0.114 ms
# JIT:
#   Functions: 4
#   Options: Inlining true, Optimization true, Expressions true, Deforming true
#   Timing: Generation 0.493 ms, Inlining 17.772 ms, Optimization 24.743 ms, Emission 5.166 ms, Total 48.174 ms
# Execution Time: 48.304 ms
#(9 rows)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie {imdb: 'tt0111161'}) return n  $$) AS (a agtype);
#QUERY PLAN                                                        
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n  (cost=10000000000.00..10000000025.00 rows=1 width=32) (actual time=50.243..50.248 rows=1 loops=1)
#   Filter: (properties @> '{"imdb": "tt0111161"}'::agtype)
#   Rows Removed by Filter: 2
# Planning Time: 0.133 ms
# JIT:
#   Functions: 4
#   Options: Inlining true, Optimization true, Expressions true, Deforming true
#   Timing: Generation 0.473 ms, Inlining 18.220 ms, Optimization 27.395 ms, Emission 4.596 ms, Total 50.684 ms
# Execution Time: 50.766 ms
#(9 rows)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie) where n.imdb =  'tt0111161' return n  $$) AS (a agtype);
#QUERY PLAN                                                        
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n  (cost=10000000000.00..10000000028.03 rows=6 width=32) (actual time=40.654..40.662 rows=1 loops=1)
#   Filter: (agtype_access_operator(VARIADIC ARRAY[properties, '"imdb"'::agtype]) = '"tt0111161"'::agtype)
#   Rows Removed by Filter: 2
# Planning Time: 0.119 ms
# JIT:
#   Functions: 4
#   Options: Inlining true, Optimization true, Expressions true, Deforming true
#   Timing: Generation 0.449 ms, Inlining 13.580 ms, Optimization 21.965 ms, Emission 5.067 ms, Total 41.061 ms
# Execution Time: 41.163 ms
#(9 rows)
#

Do the specific property indexes need to be constructed differently?

pdpotter avatar Dec 11 '23 17:12 pdpotter

@pdpotter GIN indexes are usually created on the entire column rather than an expression-

CREATE INDEX id ON cypher_index."Movie" USING gin (properties);

rafsun42 avatar Dec 11 '23 17:12 rafsun42

@pdpotter I just want to point out that once an index is created, it is up to PostgreSQL to use it or not. It is entirely possible that the PostgreSQL planner/optimizer may decide the index would not improve the query and just ignore it.

jrgemignani avatar Dec 11 '23 19:12 jrgemignani

This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 11 '24 00:05 github-actions[bot]

This issue was closed because it has been stalled for further 7 days with no activity.

github-actions[bot] avatar May 19 '24 00:05 github-actions[bot]