rdflib icon indicating copy to clipboard operation
rdflib copied to clipboard

SPARQL EBNF handing in Python

Open nicholascar opened this issue 3 years ago • 3 comments

SPARQL 1.1's grammar [1] is implemented using a W3C variant of EBNF [2] while most Parsing Expression Grammar (PEG) tools out there for Python understand standard EBNF (or ISO ABNF), e.g. Tatsu [3] or implement an EBNF-like Python grammar, e.g. Lark [4].

Has anyone converted SPARQL W3C EBNF to a form that can be used with any known Python packages?

I'm keen to have a standard grammar file for SPARQL (and ultimately Turtle etc) that RDFlib can use.

[1] https://www.w3.org/TR/rdf-sparql-query/#sparqlGrammar [2] https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form [3] https://pypi.org/project/TatSu/ [4] https://lark-parser.readthedocs.io/en/latest/grammar.html

nicholascar avatar Jul 06 '22 02:07 nicholascar

Quite some time ago I did a "one-off" Vim-script conversion from W3C EBNF to EBNF suitable for PEG.js. As I just threw regexps at the problem until I had something to continue manually from, I do not know how incomplete it is. At least it did the job for the TriG grammar.

It is straightforward to make a Python script with the same brute approach. I do not know if the resulting form is more "standard" (among the chaos of EBNF dialects). The differences may be simple enough to overcome by this crude method though (to avoid having to define an EBNF for the various EBNF dialects, in each dialect, and do a "scientifically correct" EBNF translator...).

niklasl avatar Jul 06 '22 10:07 niklasl

Dunno if it helps but I'm in the process of experimenting with using Lark-driven parsers for RDFLib's handling of the W3C-published ebnf expressions. I've found the Lark IDE quite useful.

In general, I've found the terminals to be the most tedious to get right but, once nailed, one common set of terminals covers all the RDF format ebnf expressions.

Lark's treatment is useful at times, avoiding awkward terminals like:

BASE: ("B"|"b") ("A"|"a") ("S"|"s") ("E"|"e")
PREFIX: ("P"|"p") ("R"|"r") ("E"|"e") ("F"|"f") ("I"|"i") ("X"|"x")

by being able to use /BASE/i in the grammar rules.

The attached grammar and terminals (wip) (sparql-lark.txt) parses successfully when pasted into the IDE with this (arbitrarily-selected) query:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>
SELECT  $title
WHERE   { :book1  dc:title  $title }

ghost avatar Jul 06 '22 11:07 ghost

Thanks @niklasl & @gjhiggins for the responses! I will try and spend a bit of time on a few existing RDFLib issues in the next fortnight and, if I can clear out my personal backlog, I'll then try for a SPARQL improvement push aiming to:

  • use modern PEG handing of SPARQL
  • support DESCRIBE properly
  • get ready for SPARQL 1.2
  • perhaps allow for SPARQL extension handing in a plugin manner
    • since there are some SPARQL extensions out there (not functions, syntax extensions)

nicholascar avatar Jul 13 '22 23:07 nicholascar