Mixed index not used for equality checks if not all included property keys are used
- Version: 0.6.3 /
master(tried with the Docker image tag1.0.0-20230910-132431.0663c29) - Storage Backend: Berkeley
- Mixed Index Backend: Lucene
- Link to discussed bug: #3973
- Expected Behavior: The mixed index should be used.
- Current Behavior: Mixed index is not used.
- Steps to Reproduce: Create a mixed index with 2 property keys and try to query with just one, without using any predicates.
Detailed steps for reproduction
Create a minimal schema with two property keys and a mixed index which uses both property keys:
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7dfa0a09
gremlin> mgmt.makePropertyKey('name').dataType(String.class).make()
==>name
gremlin> mgmt.makePropertyKey('age').dataType(Integer.class).make()
==>age
gremlin> mgmt.buildIndex('nameAndAge', Vertex.class).addKey(mgmt.getPropertyKey('name')).addKey(mgmt.getPropertyKey('age')).buildMixedIndex('search')
==>nameAndAge
gremlin> mgmt.commit()
==>null
Verify that the schema was successfully created (especially that the index is ENABLED):
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@4a019f70
gremlin> mgmt.printSchema()
==>------------------------------------------------------------------------------------------------
Vertex Label Name | Partitioned | Static |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Edge Label Name | Directed | Unidirected | Multiplicity |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Property Key Name | Cardinality | Data Type |
---------------------------------------------------------------------------------------------------
name | SINGLE | class java.lang.String |
age | SINGLE | class java.lang.Integer |
---------------------------------------------------------------------------------------------------
Graph Index (Vertex) | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
nameAndAge | Mixed | false | search | name: ENABLED |
| | | | age: ENABLED |
---------------------------------------------------------------------------------------------------
Graph Index (Edge) | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Relation Index (VCI) | Type | Direction | Sort Key | Order | Status |
---------------------------------------------------------------------------------------------------
Search for vertices by using just one of the two property keys, without using any search predicates (so an equality check):
gremlin> g.V().has('name', 'test').profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(test)]) 4.068 100.00
constructGraphCentricQuery 0.175
constructGraphCentricQuery 0.061
GraphCentricQuery 3.666
\_condition=(name = test)
\_orders=[]
\_isFitted=false
\_isOrdered=true
\_query=[]
scan 3.181
\_query=[]
\_fullscan=true
\_condition=VERTEX
>TOTAL - - 4.068 -
As can be seen, this does not use an index and instead performs a full graph scan.
However, it works if both property keys are used:
gremlin> g.V().has('name', 'test').has('age', 12).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(test), age.eq(12)]) 41.993 100.00
constructGraphCentricQuery 3.957
GraphCentricQuery 32.800
\_condition=(name = test AND age = 12)
\_orders=[]
\_isFitted=false
\_isOrdered=true
\_query=[(age = 12)]:nameAndAge
\_index=nameAndAge
\_index_impl=search
backend-query 0 32.410
\_query=nameAndAge:[(age = 12)]:nameAndAge
>TOTAL - - 41.993 -
or if a search predicate like textContains is used:
gremlin> g.V().has('name',textContains('test')).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[name.textContains(test)]) 17.027 100.00
constructGraphCentricQuery 0.272
GraphCentricQuery 16.520
\_condition=(name textContains test)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=[(name textContains test)]:nameAndAge
\_index=nameAndAge
\_index_impl=search
backend-query 0 16.433
\_query=nameAndAge:[(name textContains test)]:nameAndAge
>TOTAL - - 17.027 -
This seems to be a bug as the Mixed Index docs mention that a) any combination of index property keys can be used and that b) equality checks are also supported:
Mixed indexes retrieve vertices or edges by any combination of previously added property keys. Mixed indexes provide more flexibility than composite indexes and support additional condition predicates beyond equality. On the other hand, mixed indexes are slower for most equality queries than composite indexes.