antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

Parser failed unexpectly

Open CBalaa opened this issue 10 months ago • 1 comments

I use CPP to build my parser, but got error: Segmentation fault (core dumped). However, when I use Python to build, it works normally. My ANTLR4 tool version is 4.13.2.

When I change rule extra_decls: LeftBrace extra_decl (Comma extra_decl)* RightBrace; to extra_decls: LeftBrace extra_decl (Comma extra_decl)* RightBrace; , or change atom_suffix: Less identifier Greater; to atom_suffix: LeftBrace identifier RightBrace;, it works.

I think there maybe something wrong with antlr4 CPP runtime. I wonder why and how to fix it.

The following are my grammar and test code:

UniASTParser.g4

parser grammar UniASTParser;

options {
    tokenVocab = UniASTLexer;
}

rules
    : ruleSpec+ EOF
    ;

ruleSpec
    : parserRuleSpec
    ;

parserRuleSpec
    : ParserRuleName Colon ruleBlock extra_decls? Semi
    ;

extra_decls
    : Less extra_decl (Comma extra_decl)* Greater
    ;

extra_decl
    : identifier Assign identifier Less (identifier | StringLiteral) Greater
    ;

ruleBlock
    : ruleAltList
    ;

ruleAltList
    : actionAlt
    ;

actionAlt
    : alternative
    ;

alternative
    : element+
    ;

element
    : atomOrGroup
    ;

atomOrGroup
    : atom 
    ;

atom_suffix: Less identifier Greater;

atom
    : terminalDef atom_suffix?
    | ruleref atom_suffix?
    ;

terminalDef
    : LexerRuleName
    | StringLiteral
    ;

ruleref
    : ParserRuleName
    ;

identifier
    : LexerRuleName
    | ParserRuleName
    ;

UniASRLexer.g4

lexer grammar UniASTLexer;

fragment NONDIGIT: [a-zA-Z_];

fragment UPPERCASE: [A-Z];

fragment LOWERCASE: [a-z];

fragment ALLCASE: [a-zA-Z0-9_];

// string literal
fragment SHORT_STRING_LITERAL:
	'\'' SHORT_STRING_ITEM_FOR_SINGLE_QUOTE* '\''
	| '"' SHORT_STRING_ITEM_FOR_DOUBLE_QUOTE* '"';

fragment SHORT_STRING_ITEM_FOR_SINGLE_QUOTE:
	SHORT_STRING_CHAR_NO_SINGLE_QUOTE
	| STRING_ESCAPE_SEQ;
fragment SHORT_STRING_ITEM_FOR_DOUBLE_QUOTE:
	SHORT_STRING_CHAR_NO_DOUBLE_QUOTE
	| STRING_ESCAPE_SEQ;

fragment SHORT_STRING_CHAR_NO_SINGLE_QUOTE: ~[\\\r\n'];

fragment STRING_ESCAPE_SEQ: '\\' OS_INDEPENDENT_NL | '\\' .;

fragment SHORT_STRING_CHAR_NO_DOUBLE_QUOTE: ~[\\\r\n"];

fragment OS_INDEPENDENT_NL: '\r'? '\n';

fragment LONG_STRING_LITERAL:
	'\'\'\'' LONG_STRING_ITEM*? '\'\'\''; // | '"""' LONG_STRING_ITEM*? '"""'

fragment LONG_STRING_ITEM: LONG_STRING_CHAR | STRING_ESCAPE_SEQ;

fragment LONG_STRING_CHAR: ~'\\';

// integer literal
fragment INTEGER:
	DEC_INTEGER
	| BIN_INTEGER
	| OCT_INTEGER
	| HEX_INTEGER;
fragment DEC_INTEGER:
	NON_ZERO_DIGIT ('_'? DIGIT)*
	| '0'+ ('_'? '0')*;
fragment BIN_INTEGER: '0' ('b' | 'B') ('_'? BIN_DIGIT)+;
fragment OCT_INTEGER: '0' ('o' | 'O') ('_'? OCT_DIGIT)+;
fragment HEX_INTEGER: '0' ('x' | 'X') ('_'? HEX_DIGIT)+;
fragment NON_ZERO_DIGIT: [1-9];
fragment DIGIT: [0-9];
fragment BIN_DIGIT: '0' | '1';
fragment OCT_DIGIT: [0-7];
fragment HEX_DIGIT: DIGIT | [a-f] | [A-F];

// floatpoint literal
fragment FLOAT_NUMBER: POINT_FLOAT | EXPONENT_FLOAT;
fragment POINT_FLOAT: DIGIT_PART? FRACTION | DIGIT_PART '.';
fragment EXPONENT_FLOAT: (DIGIT_PART | POINT_FLOAT) EXPONENT;
fragment DIGIT_PART: DIGIT ('_'? DIGIT)*;
fragment FRACTION: '.' DIGIT_PART;
fragment EXPONENT: ('e' | 'E') ('+' | '-')? DIGIT_PART;

// boolean literal
fragment TRUE: 'True';
fragment FALSE: 'False';

// key words

FEGEN: 'fegen';

DEF: 'def';

INPUTS: 'inputs';

RETURNS: 'returns';

ACTIONS: 'actions';

IR: 'ir';

OPERAND_VALUE: 'operandValue';

ATTRIBUTE_VALUE: 'attributeValue';

CPP_VALUE: 'cppValue';

OPERATION: 'operation';

FUNCTION: 'function';

TYPEDEF: 'typedef';

OPDEF: 'opdef';

ARGUMENTS: 'arguments';

RESULTS: 'results';

BODY: 'body';

EMPTY: 'null';

PARAMETERS: 'parameters';

ASSEMBLY_FORMAT: 'assemblyFormat';

CLASS: 'class';

SELF: 'self';

// types
TYPE: 'Type';

BOOL: 'bool';

INT: 'int';

FLOAT: 'float';

STRING: 'string';

LIST: 'list';

MAP: 'map';

// stmt

IF: 'if';

ELIF: 'elif';

ELSE: 'else';

FOR: 'for';

IN: 'in';

WHILE: 'while';

RETURN: 'return';

VARIABLE: 'variable';

// marks

AND: 'and';

OR: 'or';

NOT: 'not';

IS: 'is';

Equal: '==';

NotEq: '!=';

Less: '<';

LessEq: '<=';

Greater: '>';

GreaterEq: '>=';

AT: '@';

DivDiv: '//';

Comma: ',';

Semi: ';';

LeftParen: '(';

RightParen: ')';

LeftBracket: '[';

RightBracket: ']';

LeftBrace: '{';

RightBrace: '}';

Dot: '.';

Colon: ':';

AlterOp: '|';

QuestionMark: '?';

Star: '*';

Div: '/';

Plus: '+';

Minus: '-';

Assign: '=';

StarStar: '**';

MOD: '%';

Arror: '->';

Tilde: '~';

Range: '..';

// literal

StringLiteral: SHORT_STRING_LITERAL | LONG_STRING_LITERAL;

BoolLiteral: TRUE | FALSE;

IntegerLiteral: INTEGER;

FloatPointLiteral: FLOAT_NUMBER;

// identifiers

LexerRuleName: UPPERCASE (NONDIGIT | DIGIT)*;

ParserRuleName: LOWERCASE (NONDIGIT | DIGIT)*;


Whitespace: [ \t]+ -> skip;

Newline: ('\r' '\n'? | '\n') -> skip;

BlockComment: '/*' .*? '*/' -> skip;

LineComment: '//' ~ [\r\n]* -> skip;

test.cpp

#include "UniASTLexer.h"
#include "UniASTParser.h"
#include "antlr4-runtime.h"
#include <iostream>

using namespace antlr4;
using namespace std;

int main(int argc, char **argv) {
  if (argc != 2) {
    cerr << "no input file";
    return 1;
  }
  std::ifstream stream;
  stream.open(argv[1]);
  ANTLRInputStream input(stream);
  UniAST::UniASTLexer lexer(&input);
  CommonTokenStream tokens(&lexer);
  UniAST::UniASTParser parser(&tokens);
  std::cout << parser.rules()->getText() << std::endl;
  return 0;
}

test file

module: assign_stmt{assign_stmt};
assign_stmt: variable_access{variable_access} Assign expression{expression};

CBalaa avatar Feb 19 '25 08:02 CBalaa

The input doesn't parse {assign_stmt}; of your input because you don't use LeftBrace or the string literal { anywhere in the parser grammar. Also, UniASRLexer.g4 is not the file name. It should be UniASTLexer.g4 because it's declared lexer grammar UniASTLexer;. Thus, it appears you have a build issue. I recommend that you write a build script to clean up all generated files, run the Antlr tool, and recompile everything from scratch. As you are using the Cpp target, write a Bash script with the two Antlr tool calls in the correct order, followed by the g++ or whatever compiler you are using afterwards. The segv looks like you do not have the compiler and linker flags correct, e.g., you are not compiling with pthreads.

kaby76 avatar Feb 19 '25 09:02 kaby76