lark icon indicating copy to clipboard operation
lark copied to clipboard

How can I make an SQL parser with lark?

Open btseytlin opened this issue 5 years ago • 11 comments

What is your question?

I would like to create an SQL parser with lark. I understand that I need an SQL grammar for this. I would rather not make one myself. The readme states that the expected grammar format is EBNF.

How can I use, for example, one of these grammars: https://ronsavage.github.io/SQL/? I tried using them, but they clearly have a wrong format.

If there is no way to use one of these, where can I find a grammar in suitable format?

btseytlin avatar Sep 07 '20 11:09 btseytlin

I read one of their sql grammars (sql-2003-2.bnf). It's indeed not in the format suitable for Lark. You'll have to:

  1. Extract the actual BNF from the html file (i.e. remove or comment markup)

  2. Convert the syntax to Lark's syntax, for example ::= should be :, and rules aren't surrounded by <>, etc.

And even then, you might have to make some adjustments to make it parse correctly.

I do offer freelance services for writing grammars, so if that's something you're interested in, we can discuss it.

But if you want to avoid writing the grammar, you should consider existing SQL parsers for Python, that unfortunately don't use Lark.

erezsh avatar Sep 07 '20 12:09 erezsh

@erezsh thank you!

btseytlin avatar Sep 07 '20 13:09 btseytlin

@glebmezh Sent you an email.

erezsh avatar Sep 15 '20 18:09 erezsh

@erezsh @btseytlin I’ve actually already written a SQL parser using lark, you can find it here. @erezsh I’ve been meaning to reach out to add it to the list in your README

zbrookle avatar Sep 22 '20 01:09 zbrookle

@zbrookle Thanks, looks like a decent start.

Btw I noticed you're not allowing "FULL\nOUTER JOIN" and so on, but I think SQL does allow it.

erezsh avatar Sep 22 '20 06:09 erezsh

@erezsh Yeah that's a good point, it doesn't really matter how much white space there is between the tokens

zbrookle avatar Sep 22 '20 13:09 zbrookle

@erezsh I actually just tested this and it turns out that lark is including \n as part of \s. Not sure if that's a bug or not since I'm ignoring WS. If it isn't expected I can open an issue

zbrookle avatar Oct 04 '20 18:10 zbrookle

Well, yes, \n is part of the regex group \s. This is not a 'problem' of lark, but of the re library.

MegaIng avatar Oct 04 '20 18:10 MegaIng

@MegaIng This was in reference to somewhere that I was using \s and @erezsh pointed out that it should also accept \n, which after testing it, it does. So either the package doesn't work as expected, or the comment was perhaps incorrect

zbrookle avatar Oct 04 '20 19:10 zbrookle

I think @erezsh made a mistake, but I am not sure.

MegaIng avatar Oct 04 '20 19:10 MegaIng

@zbrookle Not sure which comment that was, but legend has it that I can sometimes make mistakes.

Yes, \s includes \n.

erezsh avatar Oct 04 '20 20:10 erezsh