Parser failed unexpectly
I use CPP to build my parser, but got error: Segmentation fault (core dumped). However, when I use Python to build, it works normally. My ANTLR4 tool version is 4.13.2.
When I change rule extra_decls: LeftBrace extra_decl (Comma extra_decl)* RightBrace; to extra_decls: LeftBrace extra_decl (Comma extra_decl)* RightBrace; , or change atom_suffix: Less identifier Greater; to atom_suffix: LeftBrace identifier RightBrace;, it works.
I think there maybe something wrong with antlr4 CPP runtime. I wonder why and how to fix it.
The following are my grammar and test code:
UniASTParser.g4
parser grammar UniASTParser;
options {
tokenVocab = UniASTLexer;
}
rules
: ruleSpec+ EOF
;
ruleSpec
: parserRuleSpec
;
parserRuleSpec
: ParserRuleName Colon ruleBlock extra_decls? Semi
;
extra_decls
: Less extra_decl (Comma extra_decl)* Greater
;
extra_decl
: identifier Assign identifier Less (identifier | StringLiteral) Greater
;
ruleBlock
: ruleAltList
;
ruleAltList
: actionAlt
;
actionAlt
: alternative
;
alternative
: element+
;
element
: atomOrGroup
;
atomOrGroup
: atom
;
atom_suffix: Less identifier Greater;
atom
: terminalDef atom_suffix?
| ruleref atom_suffix?
;
terminalDef
: LexerRuleName
| StringLiteral
;
ruleref
: ParserRuleName
;
identifier
: LexerRuleName
| ParserRuleName
;
UniASRLexer.g4
lexer grammar UniASTLexer;
fragment NONDIGIT: [a-zA-Z_];
fragment UPPERCASE: [A-Z];
fragment LOWERCASE: [a-z];
fragment ALLCASE: [a-zA-Z0-9_];
// string literal
fragment SHORT_STRING_LITERAL:
'\'' SHORT_STRING_ITEM_FOR_SINGLE_QUOTE* '\''
| '"' SHORT_STRING_ITEM_FOR_DOUBLE_QUOTE* '"';
fragment SHORT_STRING_ITEM_FOR_SINGLE_QUOTE:
SHORT_STRING_CHAR_NO_SINGLE_QUOTE
| STRING_ESCAPE_SEQ;
fragment SHORT_STRING_ITEM_FOR_DOUBLE_QUOTE:
SHORT_STRING_CHAR_NO_DOUBLE_QUOTE
| STRING_ESCAPE_SEQ;
fragment SHORT_STRING_CHAR_NO_SINGLE_QUOTE: ~[\\\r\n'];
fragment STRING_ESCAPE_SEQ: '\\' OS_INDEPENDENT_NL | '\\' .;
fragment SHORT_STRING_CHAR_NO_DOUBLE_QUOTE: ~[\\\r\n"];
fragment OS_INDEPENDENT_NL: '\r'? '\n';
fragment LONG_STRING_LITERAL:
'\'\'\'' LONG_STRING_ITEM*? '\'\'\''; // | '"""' LONG_STRING_ITEM*? '"""'
fragment LONG_STRING_ITEM: LONG_STRING_CHAR | STRING_ESCAPE_SEQ;
fragment LONG_STRING_CHAR: ~'\\';
// integer literal
fragment INTEGER:
DEC_INTEGER
| BIN_INTEGER
| OCT_INTEGER
| HEX_INTEGER;
fragment DEC_INTEGER:
NON_ZERO_DIGIT ('_'? DIGIT)*
| '0'+ ('_'? '0')*;
fragment BIN_INTEGER: '0' ('b' | 'B') ('_'? BIN_DIGIT)+;
fragment OCT_INTEGER: '0' ('o' | 'O') ('_'? OCT_DIGIT)+;
fragment HEX_INTEGER: '0' ('x' | 'X') ('_'? HEX_DIGIT)+;
fragment NON_ZERO_DIGIT: [1-9];
fragment DIGIT: [0-9];
fragment BIN_DIGIT: '0' | '1';
fragment OCT_DIGIT: [0-7];
fragment HEX_DIGIT: DIGIT | [a-f] | [A-F];
// floatpoint literal
fragment FLOAT_NUMBER: POINT_FLOAT | EXPONENT_FLOAT;
fragment POINT_FLOAT: DIGIT_PART? FRACTION | DIGIT_PART '.';
fragment EXPONENT_FLOAT: (DIGIT_PART | POINT_FLOAT) EXPONENT;
fragment DIGIT_PART: DIGIT ('_'? DIGIT)*;
fragment FRACTION: '.' DIGIT_PART;
fragment EXPONENT: ('e' | 'E') ('+' | '-')? DIGIT_PART;
// boolean literal
fragment TRUE: 'True';
fragment FALSE: 'False';
// key words
FEGEN: 'fegen';
DEF: 'def';
INPUTS: 'inputs';
RETURNS: 'returns';
ACTIONS: 'actions';
IR: 'ir';
OPERAND_VALUE: 'operandValue';
ATTRIBUTE_VALUE: 'attributeValue';
CPP_VALUE: 'cppValue';
OPERATION: 'operation';
FUNCTION: 'function';
TYPEDEF: 'typedef';
OPDEF: 'opdef';
ARGUMENTS: 'arguments';
RESULTS: 'results';
BODY: 'body';
EMPTY: 'null';
PARAMETERS: 'parameters';
ASSEMBLY_FORMAT: 'assemblyFormat';
CLASS: 'class';
SELF: 'self';
// types
TYPE: 'Type';
BOOL: 'bool';
INT: 'int';
FLOAT: 'float';
STRING: 'string';
LIST: 'list';
MAP: 'map';
// stmt
IF: 'if';
ELIF: 'elif';
ELSE: 'else';
FOR: 'for';
IN: 'in';
WHILE: 'while';
RETURN: 'return';
VARIABLE: 'variable';
// marks
AND: 'and';
OR: 'or';
NOT: 'not';
IS: 'is';
Equal: '==';
NotEq: '!=';
Less: '<';
LessEq: '<=';
Greater: '>';
GreaterEq: '>=';
AT: '@';
DivDiv: '//';
Comma: ',';
Semi: ';';
LeftParen: '(';
RightParen: ')';
LeftBracket: '[';
RightBracket: ']';
LeftBrace: '{';
RightBrace: '}';
Dot: '.';
Colon: ':';
AlterOp: '|';
QuestionMark: '?';
Star: '*';
Div: '/';
Plus: '+';
Minus: '-';
Assign: '=';
StarStar: '**';
MOD: '%';
Arror: '->';
Tilde: '~';
Range: '..';
// literal
StringLiteral: SHORT_STRING_LITERAL | LONG_STRING_LITERAL;
BoolLiteral: TRUE | FALSE;
IntegerLiteral: INTEGER;
FloatPointLiteral: FLOAT_NUMBER;
// identifiers
LexerRuleName: UPPERCASE (NONDIGIT | DIGIT)*;
ParserRuleName: LOWERCASE (NONDIGIT | DIGIT)*;
Whitespace: [ \t]+ -> skip;
Newline: ('\r' '\n'? | '\n') -> skip;
BlockComment: '/*' .*? '*/' -> skip;
LineComment: '//' ~ [\r\n]* -> skip;
test.cpp
#include "UniASTLexer.h"
#include "UniASTParser.h"
#include "antlr4-runtime.h"
#include <iostream>
using namespace antlr4;
using namespace std;
int main(int argc, char **argv) {
if (argc != 2) {
cerr << "no input file";
return 1;
}
std::ifstream stream;
stream.open(argv[1]);
ANTLRInputStream input(stream);
UniAST::UniASTLexer lexer(&input);
CommonTokenStream tokens(&lexer);
UniAST::UniASTParser parser(&tokens);
std::cout << parser.rules()->getText() << std::endl;
return 0;
}
test file
module: assign_stmt{assign_stmt};
assign_stmt: variable_access{variable_access} Assign expression{expression};
The input doesn't parse {assign_stmt}; of your input because you don't use LeftBrace or the string literal { anywhere in the parser grammar. Also, UniASRLexer.g4 is not the file name. It should be UniASTLexer.g4 because it's declared lexer grammar UniASTLexer;. Thus, it appears you have a build issue. I recommend that you write a build script to clean up all generated files, run the Antlr tool, and recompile everything from scratch. As you are using the Cpp target, write a Bash script with the two Antlr tool calls in the correct order, followed by the g++ or whatever compiler you are using afterwards. The segv looks like you do not have the compiler and linker flags correct, e.g., you are not compiling with pthreads.