rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5

Open daltontc opened this issue 2 years ago • 11 comments

Current Behavior

In versions 3.7.4 and 4.1.0, iterating over a TupleQueryResult with large results is extremely slow when compared to older RDF4j versions (2.5.5).

With newer RDF4j versions, running the following query over the CHEBI Lite ontology stored in a graph takes over an hour to process results whereas in RDF4j 2.5.5 it only takes 4s.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    ?parent rdf:type owl:Class .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type owl:Class .
    }
}

See https://github.com/daltontc/rdf4jTest/tree/bug/tuplequeryresults_slow_iteration for an example

Expected Behavior

The time it takes to iterate over large query results should be relatively consistent between versions, if not better in later versions.

Steps To Reproduce

  1. Clone https://github.com/daltontc/rdf4jTest/tree/bug/tuplequeryresults_slow_iteration (requires Git LFS due to large trig file)
  2. Run Main with 2.5.5
  3. Results are iterated over in 4s
Total query time: 4228
Parent Count: 302455
Child Count: 122258
  1. Switch version to 3.7.4 or 4.1.0
  2. Run Main
  3. Results take ages to completely iterate over

Version

3.7.4, 4.1.0

Are you interested in contributing a solution yourself?

No response

Anything else?

No response

daltontc avatar Oct 12 '22 16:10 daltontc

Can you try to use the query explanation feature: https://rdf4j.org/documentation/programming/repository/#explaining-queries

The long runtime could be related to sub-optimal query optimization.

kenwenzel avatar Oct 13 '22 07:10 kenwenzel

I let the query actually finish out with 3.7.4 and it took nearly 24 hours. Crazy that the query used to run efficiently in 4 seconds but now takes nearly a full day.

Total query time: 83592008
Parent Count: 302455
Child Count: 122258

Here is the explanation:

Timed out while retrieving explanation! Explanation may be incomplete!
You can change the timeout by setting .setMaxExecutionTime(...) on your query.

Projection (resultSizeActual=38.2K, totalTimeActual=60.0s, selfTimeActual=12.2ms)
╠══ProjectionElemList
║     ProjectionElem "parent"
║     ProjectionElem "child"
╚══LeftJoin (LeftJoinIterator) (resultSizeActual=38.2K, totalTimeActual=59.9s, selfTimeActual=15.3ms)
   ├──Join (JoinIterator) (resultSizeActual=91, totalTimeActual=39.3ms, selfTimeActual=1.82ms)
   │  ╠══BindingSetAssignment ([[type=http://www.w3.org/2002/07/owl#Class], [type=http://www.w3.org/2000/01/rdf-schema#Class]]) (costEstimate=0, resultSizeEstimate=1, resultSizeActual=1, totalTimeActual=0.014ms, selfTimeActual=0.014ms)
   │  ╚══StatementPattern (costEstimate=430, resultSizeEstimate=185.1K, resultSizeActual=91, totalTimeActual=37.4ms, selfTimeActual=37.4ms)
   │        Var (name=parent)
   │        Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
   │        Var (name=type)
   └──Join (JoinIterator) (resultSizeActual=38.2K, totalTimeActual=59.9s, selfTimeActual=10.6s)
      ╠══StatementPattern (costEstimate=215, resultSizeEstimate=185.1K, resultSizeActual=12.1M, totalTimeActual=25.9s, selfTimeActual=25.9s)
      ║     Var (name=child)
      ║     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
      ║     Var (name=type)
      ╚══StatementPattern (costEstimate=1, resultSizeEstimate=191.4K, resultSizeActual=38.2K, totalTimeActual=23.3s, selfTimeActual=23.3s)
            Var (name=child)
            Var (name=_const_4592be07_uri, value=http://www.w3.org/2000/01/rdf-schema#subClassOf, anonymous)
            Var (name=parent)

If I switch the query to remove one of the values in the VALUES declaration, it runs just as slow:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    values ?type {owl:Class}
    ?parent rdf:type ?type .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type ?type .
    }
}

But by removing the VALUES declaration and subbing in just owl:Class it runs in 4s again.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    ?parent rdf:type owl:Class .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type owl:Class .
    }
}
Total query time: 4150
Parent Count: 302455
Child Count: 122258

Reworking the query to UNION both different VALUES has it run in the original 4s

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    {
        ?parent rdf:type owl:Class .
        optional {
            ?child rdfs:subClassOf ?parent ;
                   rdf:type owl:Class .
        }
    }
    UNION
    {
        ?parent rdf:type rdfs:Class .
        optional {
            ?child rdfs:subClassOf ?parent ;
                   rdf:type rdfs:Class .
        }
    }
}
Total query time: 4332
Parent Count: 302455
Child Count: 122258

It appears that there is an issue with VALUES in the query evaluation.

daltontc avatar Oct 13 '22 17:10 daltontc

Is it correct that the query you are having performance issues with is using a VALUES clause, but if you remove the VALUES clause then there is no performance difference?

hmottestad avatar Oct 14 '22 09:10 hmottestad

@hmottestad I would say that is accurate. These two essentially equivalent queries run in 23 hours and 4 seconds respectively.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    values ?type {owl:Class}
    ?parent rdf:type ?type .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type ?type .
    }
}
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    ?parent rdf:type owl:Class .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type owl:Class .
    }
}

daltontc avatar Oct 14 '22 14:10 daltontc

Probably related to:

  • https://github.com/eclipse/rdf4j/issues/1405
  • https://github.com/eclipse/rdf4j/issues/1642

hmottestad avatar Oct 15 '22 14:10 hmottestad

I just spent a fruitless two hours on this, tricky to see where the problem is. TBC.

abrokenjester avatar Nov 20 '22 04:11 abrokenjester

I'm trying to construct a benchmark that reproduces the performance difference, using a synthetic dataset (that hopefully is a little easier to work with than the full dataset provided by @daltontc ). Struggling to show any significant performance difference though. Will try and look closer at the shape of the dataset. @daltontc any insights from your end on how your data is structured? Ideally I'd like to generate a dataset that we can use to reproduce the performance issue, but not quite at the scale of your original set.

abrokenjester avatar Dec 18 '22 01:12 abrokenjester

There is a trig file in the repo that @daltontc linked too. It reproduces the issue for me.

I think it's an issue with the join order from the query planner. I'll explain more in a bit, will also hopefully push a commit to your branch.

hmottestad avatar Dec 18 '22 08:12 hmottestad

Without VALUES clause

Projection (resultSizeActual=505)
╠══ProjectionElemList
║     ProjectionElem "parent"
║     ProjectionElem "child"
╚══LeftJoin (LeftJoinIterator) (resultSizeActual=505)
   ├──StatementPattern (resultSizeEstimate=504, resultSizeActual=504)
   │     Var (name=parent)
   │     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
   │     Var (name=_const_7e619dcc_uri, value=http://www.w3.org/2002/07/owl#Class, anonymous)
   └──Join (JoinIterator) (resultSizeActual=2)
      ╠══StatementPattern (costEstimate=50, resultSizeEstimate=9.9K, resultSizeActual=2)
      ║     Var (name=child)
      ║     Var (name=_const_4592be07_uri, value=http://www.w3.org/2000/01/rdf-schema#subClassOf, anonymous)
      ║     Var (name=parent)
      ╚══StatementPattern (costEstimate=1, resultSizeEstimate=504, resultSizeActual=2)
-->         Var (name=child) <-- The ?child variable won't be bound until the "?child rdfs:subClassOf ?parent" statement is evaluated
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_7e619dcc_uri, value=http://www.w3.org/2002/07/owl#Class, anonymous)

With VALUES clause

Projection
╠══ProjectionElemList
║     ProjectionElem "parent"
║     ProjectionElem "child"
╚══LeftJoin (LeftJoinIterator)
   ├──Join (JoinIterator)
   │  ╠══BindingSetAssignment ([[type=http://www.w3.org/2002/07/owl#Class]]) (costEstimate=1, resultSizeEstimate=1)
   │  ╚══StatementPattern (costEstimate=504, resultSizeEstimate=504)
   │        Var (name=parent)
   │        Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
-->│        Var (name=type, value=http://www.w3.org/2002/07/owl#Class) <--- ?type is able to bind to the ?type below
   └──Join (JoinIterator)
      ╠══StatementPattern (costEstimate=11, resultSizeEstimate=508)
      ║     Var (name=child)
      ║     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
-->   ║     Var (name=type) <--- This ?type!
      ╚══StatementPattern (costEstimate=1, resultSizeEstimate=9.9K)
            Var (name=child)
            Var (name=_const_4592be07_uri, value=http://www.w3.org/2000/01/rdf-schema#subClassOf, anonymous)
            Var (name=parent)

Notice that the plans swap the bottom two joins. This is because I've "helped" the join optimizer by adding a lot of [] rdfs:subClassOf [] statements. I've also added 500 [] a owl:Class statements to add some cost to the ?child a ?type query line.

The join optimizer swaps the two last joins because ?type can be bound to the outer join in the query using VALUES.

See the inline comments marked with -->

hmottestad avatar Dec 18 '22 09:12 hmottestad

@jeenbroekstra I'm not going to dig any further right now. If you don't have time yourself then maybe I'll pick it up in a few days.

hmottestad avatar Dec 18 '22 09:12 hmottestad

@hmottestad nice work, thanks! I'll try and pick it up again later this week, but feel free to also dig in if you have the time :)

abrokenjester avatar Dec 18 '22 21:12 abrokenjester