grammars-v4
grammars-v4 copied to clipboard
Parsing Lua long comment as short comment
Hi,
The following valid Lua input cannot be successfully parsed by the current antlr grammar:
local function name_s ( ) --[===[]===] end
The error output:
line 1:26 mismatched input '--[===[]===] end ' expecting 'end'
After some inspection, I found that the parser parses the input --[===[]===] end
as a line comment (short comment). Then complains about a missing end
token.
I think the solution would be to set parsing preference for long comment over short comment.
Hi @bendrissou
Updating the line comment as follows resolves this issue but I'm not sure if this change complies with the spec.
LINE_COMMENT
: '--' ( ~[\r\n\u005b\u0085\u2028\u2029] SingleLineInputCharacter* )? -> channel(HIDDEN)
;
From the spec:
A comment starts with a double hyphen (--) anywhere outside a string. If the text immediately after -- is not an opening long bracket, the comment is a short comment, which runs until the end of the line. Otherwise, it is a long comment, which runs until the corresponding closing long bracket.
Hi @msagca
That does resolve the issue. But now we can't have line comments that start with the symbol [
.
@bendrissou
I'm not sure what you mean by line comments starting with [
, can you give an example?
Here is a short example:
local function name_s ( ) --[aaa
end
This is rejected by the new grammar. But accepted by the Lua compiler.
@bendrissou
The paragraph I quoted earlier from the spec suggests that --[aaa
shall be treated as a long comment. In this case, you could prepend an additional hyphen to make it ---[aaa
, then it should be recognized as a short comment. Am I getting it wrong?
Hi @msagca
Yes, you are right. This conforms to the spec.
Though the official implementation seems to treat --[aaa
as a short comment, which should not be the case. Instead it should be an incomplete long comment.
Though the official implementation seems to treat
--[aaa
as a short comment, which should not be the case. Instead it should be an incomplete long comment.
Lua 5.4.6 treats --[aaa
as a line comment, not a multi-line comment. The doc states that a opening long bracket involves equal-signs. Trying many different examples and reading the lexer source code confirms this. If it does not satisfy a standardized opening long bracket, the fall-through is a line comment.
I have a PR for Lua. I intend to add code to lex comments properly. Unfortunately, the code must involve counting the number of '='-signs (one can nest multi-line comments), which means the grammar must now be split and written with target-specific base class code.
#3652
#3652
@Dongyang0810 PR https://github.com/antlr/grammars-v4/pull/3752 handles this completely. There are two inputs you give, one before the picture in the initial comment, and the second input shown in the picture. I'll go through both.
Input 1:
--[[some comment ]] local A = 10
The parse tree is:
This is correct. The long comment ends on line 1, column 19, and the non-commented code begins on column 20.
Input 2:
--[[aaa]]AA=1
--[a]=11
a=1
The parse tree is:
This is correct. The line comment on line 1 ends on column 9, and the statement begins on line 1, column 10. The second line is completely a single-line comment because long commands must have double square brackets, with 0 to n
equal-signed nested between the double square brackets. In fact, the number of equal signs must match. If it's not a long comment, it is a single line comment, which terminates at the end of line 2. Line 3 is a new assignment statement.
Discussion
Long comments cannot be lexed without semantic predicates. The closing long comment bracket must including counting of the number of equal signs, just as the lexer in the lua source code does. In fact, the PR defines the functions with the same name and almost the same code as in the lua source code, which means it should be easier to maintain if the source code itself changes. I.e., just do what the lua interpreter does--don't think. (In fact, I was debugging the Antlr-generated parser side by side with a debugger on the C-code for lua.) That said, I am disappointed that the grammar given in the manual is not the exact same grammar as implemented in the lua parser source code.