grammars-v4
grammars-v4 copied to clipboard
Lua string parse error
NORMALSTRING
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
CHARSTRING
: '\'' ( EscapeSequence | ~('\''|'\\') )* '\''
;
LONGSTRING
: '[' NESTED_STR ']'
;
fragment
EscapeSequence
: '\\' [abfnrtvz"'|$#\\] // World of Warcraft Lua additionally escapes |$#
| '\\' '\r'? '\n'
| DecimalEscape
| HexEscape
| UtfEscape
;
fragment
DecimalEscape
: '\\' Digit
| '\\' Digit Digit
| '\\' [0-2] Digit Digit
;
fragment
HexEscape
: '\\' 'x' HexDigit HexDigit
;
fragment
UtfEscape
: '\\' 'u{' HexDigit+ '}'
;
fragment
Digit
: [0-9]
;
fragment
HexDigit
: [0-9a-fA-F]
;
The above lexer rules do not correctly recognize the following strings:
a = "\CSubsideChange"
a = "\"CSubsideChange"
a = "\"CSubs\ideChange"
a = 'http:./.\./\'
a = "http:/.\./.\."
a = ".\.\."
a = "\\"
a = "/"
How should I modify the lexer file to correctly identify it?
I solved the problem using the following lexical rule
EscapeSequence
: '\\' [abfnrtvz"'|$#\\] // World of Warcraft Lua additionally escapes |$#
| '\\' '\r'? '\n'
| '\\' .
| DecimalEscape
| HexEscape
| UtfEscape
;
Feel free suggesting a merge request with your changes.
I don't think those rules are wrong. The lexer from this repo only trips over these input strings:
a = "\CSubsideChange"
a = "\"CSubs\ideChange"
a = 'http:./.\./\'
a = "http:/.\./.\."
a = ".\.\."
which my Lua interpreter also rejects:
Lua 5.4.6 Copyright (C) 1994-2023 Lua.org, PUC-Rio
> a = "\CSubsideChange"
stdin:1: invalid escape sequence near '"\C'
> a = "\"CSubs\ideChange"
stdin:1: invalid escape sequence near '""CSubs\i'
> a = 'http:./.\./\'
stdin:1: invalid escape sequence near ''http:./.\.'
> a = "http:/.\./.\."
stdin:1: invalid escape sequence near '"http:/.\.'
> a = ".\.\."
stdin:1: invalid escape sequence near '".\.'
The following input strings are properly tokenized:
a = "\"CSubsideChange"
a = "\\"
a = "/"
Tested with:
public class Main {
private static void dump(String source) {
LuaLexer lexer = new LuaLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
System.out.printf("\n%s:\n", source);
for (Token t : stream.getTokens()) {
System.out.printf(" %-20s '%s'%n",
LuaLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
}
public static void main(String[] args) {
String[] tests = {
"a = \"\\CSubsideChange\"", // stdin:1: invalid escape sequence near '"\C'
"a = \"\\\"CSubsideChange\"",
"a = \"\\\"CSubs\\ideChange\"", // stdin:1: invalid escape sequence near '""CSubs\i'
"a = 'http:./.\\./\\'", // stdin:1: invalid escape sequence near ''http:./.\.'
"a = \"http:/.\\./.\\.\"", // stdin:1: invalid escape sequence near '"http:/.\.'
"a = \".\\.\\.\"", // stdin:1: invalid escape sequence near '".\.'
"a = \"\\\\\"",
"a = \"/\""
};
for (String test : tests) {
dump(test);
}
}
}
Output:
line 1:4 token recognition error at: '"\C'
line 1:20 token recognition error at: '"'
line 1:4 token recognition error at: '"\"CSubs\i'
line 1:22 token recognition error at: '"'
line 1:4 token recognition error at: ''http:./.\.'
line 1:16 token recognition error at: '\'
line 1:17 token recognition error at: '''
line 1:4 token recognition error at: '"http:/.\.'
line 1:16 token recognition error at: '\'
line 1:18 token recognition error at: '"'
line 1:4 token recognition error at: '".\.'
line 1:8 token recognition error at: '\'
line 1:10 token recognition error at: '"'
a = "\CSubsideChange":
NAME 'a'
null '='
NAME 'SubsideChange'
EOF '<EOF>'
a = "\"CSubsideChange":
NAME 'a'
null '='
NORMALSTRING '"\"CSubsideChange"'
EOF '<EOF>'
a = "\"CSubs\ideChange":
NAME 'a'
null '='
NAME 'deChange'
EOF '<EOF>'
a = 'http:./.\./\':
NAME 'a'
null '='
null '/'
EOF '<EOF>'
a = "http:/.\./.\.":
NAME 'a'
null '='
null '/'
null '.'
null '.'
EOF '<EOF>'
a = ".\.\.":
NAME 'a'
null '='
null '.'
EOF '<EOF>'
a = "\\":
NAME 'a'
null '='
NORMALSTRING '"\\"'
EOF '<EOF>'
a = "/":
NAME 'a'
null '='
NORMALSTRING '"/"'
EOF '<EOF>'
OMG,thank you bkiers,this is my mistake, I am dealing with lua5.1.4, not lua5.4