qlever
qlever copied to clipboard
Write a Proper SparQL parser using Antlr4
This subsumes many of the other issues below.
- First step:
- Implement a Sparql parser that supports exactly the same subset as the current one, but with a better structure and correct/automated lexing.
@manonthegithub might also be interested in your progress on this so we don't duplicate the work
@joka921 @manonthegithub on the internal fork we do have an ANTLR SPARQL grammar for the completion script that could be used for this.
I have seen this grammar and am already using it
Adding to this the current SPARQL parser also breaks if there isn't a space before .
at the end of a triple which is often the case for Wikidata examples.
Ok I tried quickfixing the .
issue because it just happens so often. Turns out SPARQL is quite weird here because the .
may appear inside literals, prefixed names and IRIs.
For example the following query works (in Blazegraph):
SELECT ?item WHERE {
?item wdt:P31 wd:Q2934.?item wdt:P39 wd:Q41240317
}
Using ^
reversing the following also works
SELECT ?item WHERE {
?item wdt:P31 wd:Q2934. wd:Q41240317 ^wdt:P39 ?item
}
However removing the
after the .
breaks parsing even though it's not needed at the same position when the ?
disambiguates. So yeah we really should use a proper parser that naturally handles this weirdness.
@joka921 note that the current ANTLR grammar doesn't support the predicate paths that #244 will soon add. I'll look into this so beware there will be some changes.
@floriankramer just a note that this would also add # comments
which aren't supported by the new lexer either.
@niklas88 Although adding those into the lexer would be relatively easy (simply consume everything up to and including the next newline when a # is found outside of another token type).
Update:
We finally are making progress on this. We already have a complete grammar and the and it is now assigned to @Qup42
This has been done and it was indeed a milestone for QLever.