Halyard
Halyard copied to clipboard
Poor performance of nested OPTIONALs
When you have a SPARQL query with nested OPTIONAL clauses, such as the following, it's performance is poor, typically causing timeouts.
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT *
WHERE {
{
SELECT ?article
WHERE {
?article a bibo:Article .
}
LIMIT 10
}
OPTIONAL {
OPTIONAL {
?article dcterms:issued ?article_issued .
}
}
}
Output of Halyard Profile for this query:
Optimized query:
Projection [2,955,991,897,878,706.5]
ProjectionElemList
ProjectionElem "article"
ProjectionElem "article_issued"
LeftJoin [2,955,991,897,878,706.5]
Slice ( limit=10 ) [3,614,563.841]
Projection [3,614,563.841]
ProjectionElemList
ProjectionElem "article"
StatementPattern [3,614,563.841]
Var (name=article)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_6dd7acd3_uri, value=http://purl.org/ontology/bibo/Article, anonymous)
LeftJoin [226.251]
SingletonSet [1]
StatementPattern [226.251]
Var (name=article)
Var (name=_const_884f353b_uri, value=http://purl.org/dc/terms/issued, anonymous)
Var (name=article_issued)
The nested OPTIONAL in this query is unnecessary, but it allows to replicate the issue without in a minimal way.
Mapping nested optional to LeftJoin with SingletonSet is correct and it should not cause any issue. I see minor issue with cardinality of a sub-select with Slice, however it does not affect final query tree. I'm aware of some specific queries causing performance issues, however unfortunately it is not as simple as just nested OPTIONAL. It requires further investigation.