grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

Lua string parse error

Open Dongyang0810 opened this issue 9 months ago • 4 comments

NORMALSTRING
    : '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;

CHARSTRING
    : '\'' ( EscapeSequence | ~('\''|'\\') )* '\''
    ;

LONGSTRING
    : '[' NESTED_STR ']'
    ;

fragment
EscapeSequence
    : '\\' [abfnrtvz"'|$#\\]   // World of Warcraft Lua additionally escapes |$# 
    | '\\' '\r'? '\n'
    | DecimalEscape
    | HexEscape
    | UtfEscape
    ;

fragment
DecimalEscape
    : '\\' Digit
    | '\\' Digit Digit
    | '\\' [0-2] Digit Digit
    ;

fragment
HexEscape
    : '\\' 'x' HexDigit HexDigit
    ;

fragment
UtfEscape
    : '\\' 'u{' HexDigit+ '}'
    ;

fragment
Digit
    : [0-9]
    ;

fragment
HexDigit
    : [0-9a-fA-F]
    ;

The above lexer rules do not correctly recognize the following strings:

a = "\CSubsideChange"
a = "\"CSubsideChange"
a = "\"CSubs\ideChange"
a = 'http:./.\./\'
a = "http:/.\./.\."
a = ".\.\."
a = "\\"
a = "/"

How should I modify the lexer file to correctly identify it?

Dongyang0810 avatar Sep 22 '23 09:09 Dongyang0810

I solved the problem using the following lexical rule

EscapeSequence
    : '\\' [abfnrtvz"'|$#\\]   // World of Warcraft Lua additionally escapes |$#
    | '\\' '\r'? '\n'
    | '\\' .
    | DecimalEscape
    | HexEscape
    | UtfEscape
    ;

Dongyang0810 avatar Sep 22 '23 11:09 Dongyang0810

Feel free suggesting a merge request with your changes.

KvanTTT avatar Sep 22 '23 12:09 KvanTTT

I don't think those rules are wrong. The lexer from this repo only trips over these input strings:

a = "\CSubsideChange"
a = "\"CSubs\ideChange"
a = 'http:./.\./\'
a = "http:/.\./.\."
a = ".\.\."

which my Lua interpreter also rejects:

Lua 5.4.6  Copyright (C) 1994-2023 Lua.org, PUC-Rio

> a = "\CSubsideChange"
stdin:1: invalid escape sequence near '"\C'

> a = "\"CSubs\ideChange"
stdin:1: invalid escape sequence near '""CSubs\i'

> a = 'http:./.\./\'
stdin:1: invalid escape sequence near ''http:./.\.'

> a = "http:/.\./.\."
stdin:1: invalid escape sequence near '"http:/.\.'

> a = ".\.\."
stdin:1: invalid escape sequence near '".\.'

The following input strings are properly tokenized:

a = "\"CSubsideChange"
a = "\\"
a = "/"

Tested with:

public class Main {

    private static void dump(String source) {
        LuaLexer lexer = new LuaLexer(CharStreams.fromString(source));
        CommonTokenStream stream = new CommonTokenStream(lexer);
        stream.fill();

        System.out.printf("\n%s:\n", source);

        for (Token t : stream.getTokens()) {
            System.out.printf("  %-20s '%s'%n",
                    LuaLexer.VOCABULARY.getSymbolicName(t.getType()),
                    t.getText().replace("\n", "\\n"));
        }
    }

    public static void main(String[] args) {
        String[] tests = {
                "a = \"\\CSubsideChange\"", // stdin:1: invalid escape sequence near '"\C'
                "a = \"\\\"CSubsideChange\"",
                "a = \"\\\"CSubs\\ideChange\"", // stdin:1: invalid escape sequence near '""CSubs\i'
                "a = 'http:./.\\./\\'", // stdin:1: invalid escape sequence near ''http:./.\.'
                "a = \"http:/.\\./.\\.\"", // stdin:1: invalid escape sequence near '"http:/.\.'
                "a = \".\\.\\.\"", // stdin:1: invalid escape sequence near '".\.'
                "a = \"\\\\\"",
                "a = \"/\""
        };

        for (String test : tests) {
            dump(test);
        }
    }
}

Output:

line 1:4 token recognition error at: '"\C'
line 1:20 token recognition error at: '"'
line 1:4 token recognition error at: '"\"CSubs\i'
line 1:22 token recognition error at: '"'
line 1:4 token recognition error at: ''http:./.\.'
line 1:16 token recognition error at: '\'
line 1:17 token recognition error at: '''
line 1:4 token recognition error at: '"http:/.\.'
line 1:16 token recognition error at: '\'
line 1:18 token recognition error at: '"'
line 1:4 token recognition error at: '".\.'
line 1:8 token recognition error at: '\'
line 1:10 token recognition error at: '"'

a = "\CSubsideChange":
  NAME                 'a'
  null                 '='
  NAME                 'SubsideChange'
  EOF                  '<EOF>'

a = "\"CSubsideChange":
  NAME                 'a'
  null                 '='
  NORMALSTRING         '"\"CSubsideChange"'
  EOF                  '<EOF>'

a = "\"CSubs\ideChange":
  NAME                 'a'
  null                 '='
  NAME                 'deChange'
  EOF                  '<EOF>'

a = 'http:./.\./\':
  NAME                 'a'
  null                 '='
  null                 '/'
  EOF                  '<EOF>'

a = "http:/.\./.\.":
  NAME                 'a'
  null                 '='
  null                 '/'
  null                 '.'
  null                 '.'
  EOF                  '<EOF>'

a = ".\.\.":
  NAME                 'a'
  null                 '='
  null                 '.'
  EOF                  '<EOF>'

a = "\\":
  NAME                 'a'
  null                 '='
  NORMALSTRING         '"\\"'
  EOF                  '<EOF>'

a = "/":
  NAME                 'a'
  null                 '='
  NORMALSTRING         '"/"'
  EOF                  '<EOF>'

bkiers avatar Sep 23 '23 08:09 bkiers

OMG,thank you bkiers,this is my mistake, I am dealing with lua5.1.4, not lua5.4

Dongyang0810 avatar Oct 10 '23 03:10 Dongyang0810