age
age copied to clipboard
Gin Indexing Problem
Describe the bug GIN indexes on Specific properties does not seem to work. This is basing off of the issue #954
How are you accessing AGE (Command line, driver, etc.)?
- [e.g. JDBC]
What data setup do we need to do?
...
SELECT create_graph('cypher_index');
SELECT create_vlabel('cypher_index', 'Movie');
CREATE INDEX imdb ON cypher_index."Movie" USING gin ((properties->'imdbRank'));
SELECT * from cypher('cypher_index', $$
CREATE
(:Movie {id: 'movie1', name: 'The Shawshank Redemption', imdbRank : 25}),
(:Movie {id: 'movie2', name: 'The Godfather', imdbRank : 60}),
(:Movie {id: 'movie3', name: 'The Dark Knight', imdbRank : 100})
$$) as (V agtype);
...
-- Explain plans are not using the index
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie) where n.imdbRank = 25 return n $$) AS (a agtype);
Result of the above EXPLAIN query
"Seq Scan on ""Movie"" n (cost=0.00..34.03 rows=6 width=32) (actual time=0.019..0.025 rows=1 loops=1)"
" Filter: (agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(id, _label_name('91013'::oid, id), properties), '""imdbRank""'::agtype]) = '25'::agtype)"
Rows Removed by Filter: 2
Planning Time: 0.058 ms
Execution Time: 0.037 ms
Another EXPLAIN query using the MATCH by property
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie {imdbRank: 25}) return n $$) AS (a agtype);
Result from the above plan
"Seq Scan on ""Movie"" n (cost=0.00..28.00 rows=1 width=32) (actual time=0.016..0.018 rows=1 loops=1)"
" Filter: (properties @> agtype_build_map('imdbRank'::text, '25'::agtype))"
Rows Removed by Filter: 2
Planning Time: 0.051 ms
Execution Time: 0.031 ms
What is the necessary configuration info needed? Just pg13 with the latest release of apache age
What is the command that caused the error?
SELECT * from cypher('my_graph_name', $$
MATCH (a:Part {part_num: '123'}), (b:Part {part_num: '345'})
CREATE (a)-[u:used_by { quantity: 1 }]->(b)
$$) as (a agtype);
ERROR: something failed to execute
Expected behavior The query should be hitting the GIN index
Environment (please complete the following information):
- Version: Postgres13 with the latest tag of Apache AGE compiled and added.
Additional context
Based on the discussions reported in other issues posted earlier indexing seems to be a bit of a riddle. I have been able to add uniqueness based on a unique index on a property
in the json be example, but that index doesn't get used when running match queries.
This might be a duplicate of https://github.com/apache/age/issues/1000
@MuhammadTahaNaveed @Zainab-Saad as they might be working on a related patch mentioned in #1000
There are indeed multiple issues with indexing that you might be experiencing:
- gin indices not used in WHERE clauses -> #1000
- specific indices not being usedin MATCH and WHERE clauses -> https://github.com/apache/age/pull/212#issuecomment-1117999045?
- unique indices not being used in MATCH or WHERE clauses
[edit] I just realized there is another indexing issue that maybe should be added here (but which I haven't encountered myself):
- #1010
@pdpotter Thank you for pulling these altogether in a nice concise list. I will have our team look at working on each of these items. @MuhammadTahaNaveed @Zainab-Saad @CapnSpek @dehowef Let us know if you might be interested in any of these :)
@arca1n I believe this issue has been resolved in the master branch. Would you be able to verify it on your end?
I tried the latest code in the master branch. Unfortunately, indexes on specific property indexes still aren't used in my reproduction (both on a numeric property as on a textual property).
Reproduction:
- Run latest master branch code in docker container
git clone [email protected]:apache/age.git
cd age
docker build -f docker/Dockerfile . -t age_index
docker run -e POSTGRES_USER=age_index -e POSTGRES_PASSWORD=age_index -e POSTGRES_DB=age_index -p 5413:5432 age_index
- Connect with this server (other shell)
psql -h 127.0.0.1 -p 5413 -U age_index -d age_index
- Try to get specific indexes working
load 'age';
# LOAD
SET search_path = ag_catalog, "$user", public;
# SET
SELECT create_graph('cypher_index');
# NOTICE: graph "cypher_index" has been created
# create_graph
#--------------
#
#(1 row)
#
SELECT create_vlabel('cypher_index', 'Movie');
#NOTICE: VLabel "Movie" has been created
# create_vlabel
#---------------
#
#(1 row)
#
CREATE INDEX id ON cypher_index."Movie" USING gin ((properties->'id'::text));
#CREATE INDEX
CREATE INDEX imdb ON cypher_index."Movie" USING gin ((properties->'imdb'::text));
#CREATE INDEX
SELECT * from cypher('cypher_index', $$
CREATE
(:Movie {id: 1, name: 'The Shawshank Redemption', imdb : 'tt0111161'}),
(:Movie {id: 2, name: 'The Godfather', imdb : 'tt0068646'}),
(:Movie {id: 3, name: 'The Dark Knight', imdb : 'tt0468569'})
$$) as (V agtype);
# v
#---
#(0 rows)
#
SET enable_seqscan = false;
# SET
SELECT * FROM cypher('cypher_index', $$ MATCH(n:Movie {id: 1}) return n $$) AS (a agtype);
# a
#-------------------------------------------------------------------------------------------------------------------------------------
# {"id": 844424930131969, "label": "Movie", "properties": {"id": 1, "imdb": "tt0111161", "name": "The Shawshank Redemption"}}::vertex
#(1 row)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie {id: 1}) return n $$) AS (a agtype);
# QUERY PLAN
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n (cost=10000000000.00..10000000025.00 rows=1 width=32) (actual time=49.585..49.589 rows=1 loops=1)
# Filter: (properties @> '{"id": 1}'::agtype)
# Rows Removed by Filter: 2
# Planning Time: 0.123 ms
# JIT:
# Functions: 4
# Options: Inlining true, Optimization true, Expressions true, Deforming true
# Timing: Generation 0.507 ms, Inlining 17.602 ms, Optimization 27.453 ms, Emission 4.498 ms, Total 50.060 ms
# Execution Time: 50.140 ms
#(9 rows)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie) where n.id = 1 return n $$) AS (a agtype);
#QUERY PLAN
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n (cost=10000000000.00..10000000028.03 rows=6 width=32) (actual time=47.709..47.713 rows=1 loops=1)
# Filter: (agtype_access_operator(VARIADIC ARRAY[properties, '"id"'::agtype]) = '1'::agtype)
# Rows Removed by Filter: 2
# Planning Time: 0.114 ms
# JIT:
# Functions: 4
# Options: Inlining true, Optimization true, Expressions true, Deforming true
# Timing: Generation 0.493 ms, Inlining 17.772 ms, Optimization 24.743 ms, Emission 5.166 ms, Total 48.174 ms
# Execution Time: 48.304 ms
#(9 rows)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie {imdb: 'tt0111161'}) return n $$) AS (a agtype);
#QUERY PLAN
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n (cost=10000000000.00..10000000025.00 rows=1 width=32) (actual time=50.243..50.248 rows=1 loops=1)
# Filter: (properties @> '{"imdb": "tt0111161"}'::agtype)
# Rows Removed by Filter: 2
# Planning Time: 0.133 ms
# JIT:
# Functions: 4
# Options: Inlining true, Optimization true, Expressions true, Deforming true
# Timing: Generation 0.473 ms, Inlining 18.220 ms, Optimization 27.395 ms, Emission 4.596 ms, Total 50.684 ms
# Execution Time: 50.766 ms
#(9 rows)
#
SELECT * FROM cypher('cypher_index', $$ EXPLAIN ANALYZE MATCH(n:Movie) where n.imdb = 'tt0111161' return n $$) AS (a agtype);
#QUERY PLAN
#--------------------------------------------------------------------------------------------------------------------------
# Seq Scan on "Movie" n (cost=10000000000.00..10000000028.03 rows=6 width=32) (actual time=40.654..40.662 rows=1 loops=1)
# Filter: (agtype_access_operator(VARIADIC ARRAY[properties, '"imdb"'::agtype]) = '"tt0111161"'::agtype)
# Rows Removed by Filter: 2
# Planning Time: 0.119 ms
# JIT:
# Functions: 4
# Options: Inlining true, Optimization true, Expressions true, Deforming true
# Timing: Generation 0.449 ms, Inlining 13.580 ms, Optimization 21.965 ms, Emission 5.067 ms, Total 41.061 ms
# Execution Time: 41.163 ms
#(9 rows)
#
Do the specific property indexes need to be constructed differently?
@pdpotter GIN indexes are usually created on the entire column rather than an expression-
CREATE INDEX id ON cypher_index."Movie" USING gin (properties);
@pdpotter I just want to point out that once an index is created, it is up to PostgreSQL to use it or not. It is entirely possible that the PostgreSQL planner/optimizer may decide the index would not improve the query and just ignore it.
This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for further 7 days with no activity.