grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

[PlSql] "REM", "REMARK", "PRO", "PROMPT" can not be a identifier

Open jiefei30 opened this issue 2 years ago • 6 comments
trafficstars

REM, REMARK, PRO, PROMPT These words cannot be a identifier. such as

SELECT REMARK FROM T1

This sql will be parsed error in Antlr. But it is a correct sql in Oracle. I checked the PlSqlLexer.g4 , However ,These words are not defined as keywords. So, What's going on this

jiefei30 avatar Nov 13 '23 11:11 jiefei30

it seems like this code caused :

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;

line 2450 in PlSqlLexer.g4

jiefei30 avatar Nov 17 '23 03:11 jiefei30

CREATE TABLE AGMT (LIMIT_FLAG VARCHAR2(16), REMARK VARCHAR2(512))

it's ok , but

CREATE TABLE AGMT (LIMIT_FLAG VARCHAR2(16), 
REMARK VARCHAR2(512))

it's not ok. the only difference is \n before REMARK in the second sql

jiefei30 avatar Nov 17 '23 03:11 jiefei30

it seems like this code caused :

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;

line 2450 in PlSqlLexer.g4

@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !

jiefei30 avatar Nov 17 '23 04:11 jiefei30

it seems like this code caused :

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;

line 2450 in PlSqlLexer.g4

@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !

The change I made is unrelated to this problem. All I did was was to rename self. to this. for those two rules in order to put the grammar into "target agnostic format".

REMARK_COMMENT was added long before, first here: https://github.com/antlr/grammars-v4/commit/3f0150f57505dde0792739e79c0030a8c912e425#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2324

A predicate was then added the same day here: https://github.com/antlr/grammars-v4/commit/356f3ea19e3c62fa92e1f3c7997daa8ec7711ad9#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

Then, it was changed again the day after to what it was until I changed it: https://github.com/antlr/grammars-v4/commit/356f3ea19e3c62fa92e1f3c7997daa8ec7711ad9#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

I will look at it over the weekend. My first impression, though, is that REM and PRO are parser-state aware lexing because it's not just that you have to look for the previous newline chars, but verify that it's not part of a statement. This is one of the things Antlr does not do well at all.

kaby76 avatar Nov 17 '23 11:11 kaby76

it seems like this code caused :

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;

line 2450 in PlSqlLexer.g4

@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !

The change I made is unrelated to this problem. All I did was was to rename self. to this. for those two rules in order to put the grammar into "target agnostic format".

REMARK_COMMENT was added long before, first here: 3f0150f#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2324

A predicate was then added the same day here: 356f3ea#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

Then, it was changed again the day after to what it was until I changed it: 356f3ea#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

I will look at it over the weekend. My first impression, though, is that REM and PRO are parser-state aware lexing because it's not just that you have to look for the previous newline chars, but verify that it's not part of a statement. This is one of the things Antlr does not do well at all.

@KvanTTT ok, I get it. Thanks for your reply. My current way is to temporarily remove these two lexical definitions

jiefei30 avatar Nov 19 '23 10:11 jiefei30

I think the problem here is that REMARK and PROMPT should be considered commands. They're not really comments. So, I think you're right, the rules should not be there in the lexer.

kaby76 avatar Nov 19 '23 10:11 kaby76