bnfc icon indicating copy to clipboard operation
bnfc copied to clipboard

Add SQL grammar as example

Open andreasabel opened this issue 4 years ago • 5 comments

Here is a partial grammar of SQL that I'd like to add to the example suite: https://github.com/GrammaticalFramework/gf-contrib/blob/master/query-converter/MinSQL.bnf

SQL has case-insensitive keywords. This is a feature we could add to BNFC via some pragma, e.g.,

token case-insensitive keywords

as special case of a general

token case-insensitive <name> <regex>

andreasabel avatar Apr 29 '21 08:04 andreasabel

Case-insensitive keywords is really a useful feature! It will be better to add a --case-insensitive option to bnfc besides adding a pragma to a token.

In fact, I am facing the same problem with the Haskell backend (--text-token). I tried to manually modify the generated treeFind function in Lex.x:

treeFind N = tv s
treeFind (B a t left right) | (Data.Text.toUpper s)  < (Data.Text.toUpper a) = treeFind left
                            | (Data.Text.toUpper s)  > (Data.Text.toUpper a) = treeFind right
                            | (Data.Text.toUpper s) == (Data.Text.toUpper a) = t

It seems to work. However it will be better to make it work with all backends by a simple pragma. I am really looking forward to seeing the feature!

Commelina avatar May 12 '21 09:05 Commelina

It will be better to add a --case-insensitive option to bnfc besides adding a pragma to a token.

I think case-insensitive keywords are rather a property of the language defined by the grammar, than a method on how this grammar should be processed. So I favor a pragma in the grammar file over a command line option to bnfc. Options should configure the backends but not change the semantics of the grammar.

A shorter pragma would be

case-insensitive keywords;

andreasabel avatar May 19 '21 18:05 andreasabel

Would there be any use case for separating whether keywords are case-insensitive from whether token types are case-insensitive? For instance, strings are tokens and usually they should record the case actually used. More generally tokens are defined by regular expression which (comparing with other languages/tools) usually are case-sensitive if you specify a literal character/string or an explicit range like "[a-z]" or "[A-Z]".

(When a regular expression needs case-insensitivity for more than just an individual character "[Aa]", a lot of the predefined character classes signifying e.g. "alphabetic", "alphanumeric", "unicode alphabetic" include both cases and there's usually an option to make a string literal in a regex be case-insensitive – well, usually the whole regex, but we can imagine tagging individual literal sequences with BNFC's encoding since it's structured rather than being a string with various escapes for regex features.)

Would there ever be a case for marking individual keywords case-sensitive or not? E.g. X . Y ::= "ProperCase" String anycase "but THIS can be ANY case";?

ScottFreeCode avatar Oct 23 '21 13:10 ScottFreeCode

I'm considering whipping up a workaround using some combination of define, internal and _ . to make the uppercase versions synonyms rewritten to be the lowercase versions or vice versa.

Is there a better way at this point? Any news or advice?

ScottFreeCode avatar Apr 12 '24 00:04 ScottFreeCode