janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Mixed index not used for equality checks if not all included property keys are used

Open FlorianHockmann opened this issue 2 years ago • 0 comments

  • Version: 0.6.3 / master (tried with the Docker image tag 1.0.0-20230910-132431.0663c29)
  • Storage Backend: Berkeley
  • Mixed Index Backend: Lucene
  • Link to discussed bug: #3973
  • Expected Behavior: The mixed index should be used.
  • Current Behavior: Mixed index is not used.
  • Steps to Reproduce: Create a mixed index with 2 property keys and try to query with just one, without using any predicates.

Detailed steps for reproduction

Create a minimal schema with two property keys and a mixed index which uses both property keys:

gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7dfa0a09
gremlin> mgmt.makePropertyKey('name').dataType(String.class).make()
==>name
gremlin> mgmt.makePropertyKey('age').dataType(Integer.class).make()
==>age
gremlin> mgmt.buildIndex('nameAndAge', Vertex.class).addKey(mgmt.getPropertyKey('name')).addKey(mgmt.getPropertyKey('age')).buildMixedIndex('search')
==>nameAndAge
gremlin> mgmt.commit()
==>null

Verify that the schema was successfully created (especially that the index is ENABLED):

gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@4a019f70
gremlin> mgmt.printSchema()
==>------------------------------------------------------------------------------------------------
Vertex Label Name              | Partitioned | Static                                             |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Edge Label Name                | Directed    | Unidirected | Multiplicity                         |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Property Key Name              | Cardinality | Data Type                                          |
---------------------------------------------------------------------------------------------------
name                           | SINGLE      | class java.lang.String                             |
age                            | SINGLE      | class java.lang.Integer                            |
---------------------------------------------------------------------------------------------------
Graph Index (Vertex)           | Type        | Unique    | Backing        | Key:           Status |
---------------------------------------------------------------------------------------------------
nameAndAge                     | Mixed       | false     | search         | name:         ENABLED |
                               |             |           |                | age:          ENABLED |
---------------------------------------------------------------------------------------------------
Graph Index (Edge)             | Type        | Unique    | Backing        | Key:           Status |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Relation Index (VCI)           | Type        | Direction | Sort Key       | Order    |     Status |
---------------------------------------------------------------------------------------------------

Search for vertices by using just one of the two property keys, without using any search predicates (so an equality check):

gremlin> g.V().has('name', 'test').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(test)])                                                             4.068   100.00
  constructGraphCentricQuery                                                                   0.175
  constructGraphCentricQuery                                                                   0.061
  GraphCentricQuery                                                                            3.666
    \_condition=(name = test)
    \_orders=[]
    \_isFitted=false
    \_isOrdered=true
    \_query=[]
    scan                                                                                       3.181
    \_query=[]
    \_fullscan=true
    \_condition=VERTEX
                                            >TOTAL                     -           -           4.068        -

As can be seen, this does not use an index and instead performs a full graph scan.

However, it works if both property keys are used:

gremlin> g.V().has('name', 'test').has('age', 12).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(test), age.eq(12)])                                                41.993   100.00
  constructGraphCentricQuery                                                                   3.957
  GraphCentricQuery                                                                           32.800
    \_condition=(name = test AND age = 12)
    \_orders=[]
    \_isFitted=false
    \_isOrdered=true
    \_query=[(age = 12)]:nameAndAge
    \_index=nameAndAge
    \_index_impl=search
    backend-query                                                      0                      32.410
    \_query=nameAndAge:[(age = 12)]:nameAndAge
                                            >TOTAL                     -           -          41.993        -

or if a search predicate like textContains is used:

gremlin> g.V().has('name',textContains('test')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.textContains(test)])                                                  17.027   100.00
  constructGraphCentricQuery                                                                   0.272
  GraphCentricQuery                                                                           16.520
    \_condition=(name textContains test)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=[(name textContains test)]:nameAndAge
    \_index=nameAndAge
    \_index_impl=search
    backend-query                                                      0                      16.433
    \_query=nameAndAge:[(name textContains test)]:nameAndAge
                                            >TOTAL                     -           -          17.027        -

This seems to be a bug as the Mixed Index docs mention that a) any combination of index property keys can be used and that b) equality checks are also supported:

Mixed indexes retrieve vertices or edges by any combination of previously added property keys. Mixed indexes provide more flexibility than composite indexes and support additional condition predicates beyond equality. On the other hand, mixed indexes are slower for most equality queries than composite indexes.

FlorianHockmann avatar Sep 13 '23 14:09 FlorianHockmann