qlever
qlever copied to clipboard
Handling Escaped characters (ECHAR in Sparql/Turtle Grammar) is wrong for the SparqlParser, the TurtleParser and the Regex Filter Parser
The following query takes 164 seconds on http://qlever.informatik.uni-freiburg.de/Wikidata_Full :
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <http://schema.org/>
SELECT ?x WHERE {
?x rdf:type schema:Article .
FILTER regex(?x, "^<https://en.wikipedia.org/wiki/Albert_Ein")
}
Here are the specs from the execution tree:
FILTER ?x regex ^<https://en.wikipedia.org/wiki/Albert_Ein
Size: 29 x 1
Cols: ?x
Time: 162,932ms
INDEX SCAN ?x <Article>
Size: 67,677,598 x 1
Cols: ?x
Time: 248ms
I though that a prefix regex FILTER is implemented by doing one or two binary searches on the sorted IDs and then manifesting strings only for the result IDs (only 29 in this case).
However, the high query time indicates that the strings are looked up for all 67,677,598 IDs. Why?
Fixed via #295
@joka921 I don't understand how this is fixed by #295 (which was about the pattern trick not being used in some cases). The problem for the query above is that the prefix FILTER takes forever, although it could be fast.
I just tried the query again on the current version (where #295 has been incorporated) and the problem is still there.
I had a look at this and found the following:
-
The actual problem is simple,
"^<https://en.wikipedia.org/wiki/Albert_Ein"
is not a simple prefix regex but contains a.
which is "match any character". So the actual behavior in your case is correct. -
You probably wanted to escape the
.
, to my understanding this should be done by using two backslashes, once for Sparql and one for the regexengine, so,FILTER regex(?x, "^<https://en\\.wikipedia\\.org/wiki/Albert_Ein")
-
This escaping is broken on very many Levels in the current parsing (The actual lexing regex is wrong, the handling of the escapes in the regex filter parser is wrong and the Sparql escape handling is currently nonexisting. I will have a closer look at this.