grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

DELIMITER support in MySQL grammar

Open scadgek opened this issue 6 years ago • 7 comments

Сurrent MySQL grammar is missing any rules for parsing the DELIMITER command. It causes problems when, for example, parsing the output of mysqldump: the utility encloses all function/procedure declarations into DELIMITER ;; - DELIMITER ; commands. So the parser essentially fails here based on the grammar.

I was beginning to add the delimiter command support into the grammar, but then realized that it won't help much: after the command sets any non-default delimiter, the parser still fails on the next statement with a new one. I read that ANTLR4 is able to tweak the grammar conditionally somehow, so probably it's possible to make it change the delimiter after encountering the DELIMITER command. But then it's more of a design question: this makes the grammar execute conditional logic which to some extent could be defined as parsing, and this is what grammars shouldn't ideally do. The workaround I'm going to have it to preprocess the output of mysqldump and cut out everything enclosed into DELIMITER commands (anyway I don't need to parse functions for my purposes).

So, the question is whether the MySQL grammar should support DELIMITER command and if yes, how? I would much appreciate any comment from @KvanTTT and @parrt as the creators of MySQL grammar and ANTLR4 itself accordingly.

scadgek avatar Nov 24 '17 13:11 scadgek

/cc @ivan-khudyashev

KvanTTT avatar Nov 28 '17 13:11 KvanTTT

Hello, scadgek.

This is a problem with DELIMITER really. If we want parse statement of type DELIMITER, then we get language (other MySQL statements + DELIMITER statement) with only parse by Context-Sensitive Grammar. ANTLR4 can do that. But it is necessary to use semantic predicate then. And if you use semantic predicate, then you get dependency from runtime (java, python, c++, whatever ...) The purpose for CURRENTLY THIS version of grammar: NO dependency from runtime. And the price: we can not parse DELIMITER statement.

Of course, it is possible to improve grammar for any runtime - ANTLR4 has this ability. But we can not do that, because of our purpose. If you can, that will be great :)

And one more. In exactly your case:

delimiter ;;
--some code of routine ;;
delimiter ;

it is possible use this rules:

PARSER file

sqlStatement
    : ddlStatement | dmlStatement | transactionStatement
    | replicationStatement | preparedStatement
    | administrationStatement | utilityStatement
    | clientStatement
    ;
clientStatement
    : DELIMITER SEMI SEMI?
    ;

LEXER file

DELIMITER:    'DELIMITER';
SEMI:                                ';';

ivan-khudyashev avatar Dec 01 '17 09:12 ivan-khudyashev

Or parse the delimiter/delimiter block as a separate mode.

druud avatar Jul 14 '19 13:07 druud

I'm pretty late here, but want to add some important information: the DELIMITER keyword is not part of the MySQL language, instead it is a construct used by the MySQL console (and other client tools like MySQL Workbench) to allow parsing multiple statements, which internally contain the semicolon (which is part of the language).

The correct approach here is to split multiple statements according to the current delimiter and then feed them one by one to the parser. Here's an example of a statement splitter written in C++ that can split a million lines in less than a second.

mike-lischke avatar Sep 07 '19 11:09 mike-lischke

@codeFather2, please take note about this info before grammar refactoring.

KvanTTT avatar Sep 07 '19 12:09 KvanTTT

I have an idea to support delimiter command. like @mike-lischke say, delimiter is not part of the MySQL language but it used by console. But sometime we got is by use like MySQL dump or other tools, and we execute it with a JDBC client. so we want ANTLR parser to know this keyword rather than report errors. So we can follow the following steps

  1. We need add delimiter keyword in lexer.g4 image

  2. I think delimiter is a independent statement, and if you write a delimiter statement, this means you have two terminators (include ; ), so we need some function to dynamically record the current terminator and change normal statement terminator image image

  3. At last, we need to define our function in memebers, I use golang image

Use Goland plugin ANTLR preview to see the final effect image

Btw, there is no solution for double semicolon situation, at this stage, I have handled this situation in the upper layer application

daLingZhong avatar Apr 25 '24 03:04 daLingZhong

Just because you can, doesn't mean you should. Using the parser to separate statements is way slower than a simple splitter (1:100 if not more). If all you need is to parse SQL (say, for error checking) and you don't need anything else, then go for this solution. But as soon as you need anything else (like statement positions, code completion etc.) this solution will not help.

mike-lischke avatar Apr 25 '24 07:04 mike-lischke