lark icon indicating copy to clipboard operation
lark copied to clipboard

Cleaner EBNF grammar

Open brupelo opened this issue 6 years ago • 18 comments

Would it be possible to modify the current ebnf grammar so instead the current syntax where you're forced to have the first rule and colon after the rule name on the same line:

grammar = '''
    rule : rule1
        | rule2
'''

you could have indented and clean blocks like this (fornmat used on glsl specs):

grammar = '''
    rule : 
        rule1
        rule2
'''

or maybe (not very clean one):

grammar = '''
    rule : 
         | rule1
         | rule2
'''

or:

grammar = '''
    rule 
         : rule1
         | rule2
'''

or (inspired from Antlr4 ):

grammar = '''
    rule 
         : rule1
         | rule2
         ;
'''

Rationale: That way the EBNF grammar will become much more readable and not only that, you'll be able to fold long grammars easily on your favourite text editor because the grammar now has proper indentation, example here.

Guess it's a matter to tweak a bit this file, even if you don't like the idea, could you explain how you'd do so?

Thanks.

brupelo avatar May 26 '18 10:05 brupelo

I like these versions: 1.

        rule: 
            rule1
            rule2
        rule 
             : rule1
             | rule2

The first one is cleaner and the positions of the alternatives are easy to rearrange. But I'm worried that it's not clear enough that each line is a different option.

The second one is a little bit weird, but it might work.

I'll have to give it some thought, to make sure it won't collide with other concepts.

erezsh avatar May 26 '18 11:05 erezsh

Yeah, I do agree, the first one is the "optimal" one from the whole set of EBNF grammars (as it doesn't contain redundant/verbose elements, it's the most pythonic one :))

Also, did you look this repo, it contains a lot of ebnf grammars ready to go... it'd be cool if they could be used out of the box with lark, or maybe converting them automatically to lark... Guess adding more additional arguments to the Lark constructor to specify which type of ebnf you're dealing with would be ugly.

I mean, I guess this is some sort of tradeoff... usually you want your functions/constructor/ui/gui to be as minimal as possible so they become clear as water for users, quoting:

The ideal numbers of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification ‐ and then shouldn't be used anyway.

Anyway, just give it some thoughts, I like to bring to the table new use-cases or improvements about usability ;)

B.

NS: When I said "converting them" automatically I meant maybe creating some sort of script (without modifying the lark core so the code doesn't become more complex without any real reason)

brupelo avatar May 26 '18 12:05 brupelo

it'd be cool if they could be used out of the box with lark, or maybe converting them automatically to lark

Yes, that's a good idea. I'd say converting them is the way to go. However, Not all of them can be converted; many of them require code that resides in the grammar to work correctly.

The conversion script can be added to lark.tools.

erezsh avatar May 26 '18 14:05 erezsh

I'm concerned that the nonterminal : | terminal | nonterminal ;

could be understood as an "empty" Right Hand Side but I think we should have some way of being explicit that the right hand side matches "nothing" or no input. Since "[someterminal]" means that the someterminal is optional, perhaps if we use the syntax [] to mean an optional "empty".

whitten avatar May 26 '18 14:05 whitten

Guys, just for the sake of making my point clearer (sometimes the best way to prove something is by presenting visual samples) I want you to take a look to the below comparison between the syntax used here and the lark one:

GLSLangSpec.4.60.original_ebnf

variable_identifier :
    IDENTIFIER
primary_expression :
    variable_identifier
    INTCONSTANT
    UINTCONSTANT
    FLOATCONSTANT
    BOOLCONSTANT
    DOUBLECONSTANT
    LEFT_PAREN expression RIGHT_PAREN
postfix_expression :
    primary_expression
    postfix_expression LEFT_BRACKET integer_expression RIGHT_BRACKET
    function_call
    postfix_expression DOT FIELD_SELECTION
    postfix_expression INC_OP
    postfix_expression DEC_OP
integer_expression :
    expression
function_call :
    function_call_or_method
function_call_or_method :
    function_call_generic
function_call_generic :
    function_call_header_with_parameters RIGHT_PAREN
    function_call_header_no_parameters RIGHT_PAREN
function_call_header_no_parameters :
    function_call_header VOID
    function_call_header
function_call_header_with_parameters :
    function_call_header assignment_expression
    function_call_header_with_parameters COMMA assignment_expression
function_call_header :
    function_identifier LEFT_PAREN
function_identifier :
    type_specifier
    postfix_expression
unary_expression :
    postfix_expression
    INC_OP unary_expression
    DEC_OP unary_expression
    unary_operator unary_expression
unary_operator :
    PLUS
    DASH
    BANG
    TILDE
multiplicative_expression :
    unary_expression
    multiplicative_expression STAR unary_expression
    multiplicative_expression SLASH unary_expression
    multiplicative_expression PERCENT unary_expression
additive_expression :
    multiplicative_expression
    additive_expression PLUS multiplicative_expression
    additive_expression DASH multiplicative_expression
shift_expression :
    additive_expression
    shift_expression LEFT_OP additive_expression
    shift_expression RIGHT_OP additive_expression
relational_expression :
    shift_expression
    relational_expression LEFT_ANGLE shift_expression
    relational_expression RIGHT_ANGLE shift_expression
    relational_expression LE_OP shift_expression
    relational_expression GE_OP shift_expression
equality_expression :
    relational_expression
    equality_expression EQ_OP relational_expression
    equality_expression NE_OP relational_expression
and_expression :
    equality_expression
    and_expression AMPERSAND equality_expression
exclusive_or_expression :
    and_expression
    exclusive_or_expression CARET and_expression
inclusive_or_expression :
    exclusive_or_expression
    inclusive_or_expression VERTICAL_BAR exclusive_or_expression
logical_and_expression :
    inclusive_or_expression
    logical_and_expression AND_OP inclusive_or_expression
logical_xor_expression :
    logical_and_expression
    logical_xor_expression XOR_OP logical_and_expression
logical_or_expression :
    logical_xor_expression
    logical_or_expression OR_OP logical_xor_expression
conditional_expression :
    logical_or_expression
    logical_or_expression QUESTION expression COLON assignment_expression
assignment_expression :
    conditional_expression
    unary_expression assignment_operator assignment_expression
assignment_operator :
    EQUAL
    MUL_ASSIGN
    DIV_ASSIGN
    MOD_ASSIGN
    ADD_ASSIGN
    SUB_ASSIGN
    LEFT_ASSIGN
    RIGHT_ASSIGN
    AND_ASSIGN
    XOR_ASSIGN
    OR_ASSIGN
expression :
    assignment_expression
    expression COMMA assignment_expression
constant_expression :
    conditional_expression
declaration :
    function_prototype SEMICOLON
    init_declarator_list SEMICOLON
    PRECISION precision_qualifier type_specifier SEMICOLON
    type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE SEMICOLON
    type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
    SEMICOLON
    type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
    array_specifier SEMICOLON
    type_qualifier SEMICOLON
    type_qualifier IDENTIFIER SEMICOLON
    type_qualifier IDENTIFIER identifier_list SEMICOLON
identifier_list :
    COMMA IDENTIFIER
    identifier_list COMMA IDENTIFIER
function_prototype :
    function_declarator RIGHT_PAREN
function_declarator :
    function_header
    function_header_with_parameters
function_header_with_parameters :
    function_header parameter_declaration
    function_header_with_parameters COMMA parameter_declaration
function_header :
    fully_specified_type IDENTIFIER LEFT_PAREN
parameter_declarator :
    type_specifier IDENTIFIER
    type_specifier IDENTIFIER array_specifier
parameter_declaration :
    type_qualifier parameter_declarator
    parameter_declarator
    type_qualifier parameter_type_specifier
    parameter_type_specifier
parameter_type_specifier :
    type_specifier
init_declarator_list :
    single_declaration
    init_declarator_list COMMA IDENTIFIER
    init_declarator_list COMMA IDENTIFIER array_specifier
    init_declarator_list COMMA IDENTIFIER array_specifier EQUAL initializer
    init_declarator_list COMMA IDENTIFIER EQUAL initializer
single_declaration :
    fully_specified_type
    fully_specified_type IDENTIFIER
    fully_specified_type IDENTIFIER array_specifier
    fully_specified_type IDENTIFIER array_specifier EQUAL initializer
    fully_specified_type IDENTIFIER EQUAL initializer
fully_specified_type :
    type_specifier
    type_qualifier type_specifier
invariant_qualifier :
    INVARIANT
interpolation_qualifier :
    SMOOTH
    FLAT
    NOPERSPECTIVE
layout_qualifier :
    LAYOUT LEFT_PAREN layout_qualifier_id_list RIGHT_PAREN
layout_qualifier_id_list :
    layout_qualifier_id
    layout_qualifier_id_list COMMA layout_qualifier_id
layout_qualifier_id :
    IDENTIFIER
    IDENTIFIER EQUAL constant_expression
    SHARED
precise_qualifier :
    PRECISE
type_qualifier :
    single_type_qualifier
    type_qualifier single_type_qualifier
single_type_qualifier :
    storage_qualifier
    layout_qualifier
    precision_qualifier
    interpolation_qualifier
    invariant_qualifier
    precise_qualifier
storage_qualifier :
    CONST
    IN
    OUT
    INOUT
    CENTROID
    PATCH
    SAMPLE
    UNIFORM
    BUFFER
    SHARED
    COHERENT
    VOLATILE
    RESTRICT
    READONLY
    WRITEONLY
    SUBROUTINE
    SUBROUTINE LEFT_PAREN type_name_list RIGHT_PAREN
type_name_list :
    TYPE_NAME
    type_name_list COMMA TYPE_NAME
type_specifier :
    type_specifier_nonarray
    type_specifier_nonarray array_specifier
array_specifier :
    LEFT_BRACKET RIGHT_BRACKET
    LEFT_BRACKET constant_expression RIGHT_BRACKET
    array_specifier LEFT_BRACKET RIGHT_BRACKET
    array_specifier LEFT_BRACKET constant_expression RIGHT_BRACKET
type_specifier_nonarray :
    VOID
    FLOAT
    DOUBLE
    INT
    UINT
    BOOL
    VEC2
    VEC3
    VEC4
    DVEC2
    DVEC3
    DVEC4
    BVEC2
    BVEC3
    BVEC4
    IVEC2
    IVEC3
    IVEC4
    UVEC2
    UVEC3
    UVEC4
    MAT2
    MAT3
    MAT4
    MAT2X2
    MAT2X3
    MAT2X4
    MAT3X2
    MAT3X3
    MAT3X4
    MAT4X2
    MAT4X3
    MAT4X4
    DMAT2
    DMAT3
    DMAT4
    DMAT2X2
    DMAT2X3
    DMAT2X4
    DMAT3X2
    DMAT3X3
    DMAT3X4
    DMAT4X2
    DMAT4X3
    DMAT4X4
    ATOMIC_UINT
    SAMPLER2D
    SAMPLER3D
    SAMPLERCUBE
    SAMPLER2DSHADOW
    SAMPLERCUBESHADOW
    SAMPLER2DARRAY
    SAMPLER2DARRAYSHADOW
    SAMPLERCUBEARRAY
    SAMPLERCUBEARRAYSHADOW
    ISAMPLER2D
    ISAMPLER3D
    ISAMPLERCUBE
    ISAMPLER2DARRAY
    ISAMPLERCUBEARRAY
    USAMPLER2D
    USAMPLER3D
    USAMPLERCUBE
    USAMPLER2DARRAY
    USAMPLERCUBEARRAY
    SAMPLER1D
    SAMPLER1DSHADOW
    SAMPLER1DARRAY
    SAMPLER1DARRAYSHADOW
    ISAMPLER1D
    ISAMPLER1DARRAY
    USAMPLER1D
    USAMPLER1DARRAY
    SAMPLER2DRECT
    SAMPLER2DRECTSHADOW
    ISAMPLER2DRECT
    USAMPLER2DRECT
    SAMPLERBUFFER
    ISAMPLERBUFFER
    USAMPLERBUFFER
    SAMPLER2DMS
    ISAMPLER2DMS
    USAMPLER2DMS
    SAMPLER2DMSARRAY
    ISAMPLER2DMSARRAY
    USAMPLER2DMSARRAY
    IMAGE2D
    IIMAGE2D
    UIMAGE2D
    IMAGE3D
    IIMAGE3D
    UIMAGE3D
    IMAGECUBE
    IIMAGECUBE
    UIMAGECUBE
    IMAGEBUFFER
    IIMAGEBUFFER
    UIMAGEBUFFER
    IMAGE1D
    IIMAGE1D
    UIMAGE1D
    IMAGE1DARRAY
    IIMAGE1DARRAY
    UIMAGE1DARRAY
    IMAGE2DRECT
    IIMAGE2DRECT
    UIMAGE2DRECT
    IMAGE2DARRAY
    IIMAGE2DARRAY
    UIMAGE2DARRAY
    IMAGECUBEARRAY
    IIMAGECUBEARRAY
    UIMAGECUBEARRAY
    IMAGE2DMS
    IIMAGE2DMS
    UIMAGE2DMS
    IMAGE2DMSARRAY
    IIMAGE2DMSARRAY
    UIMAGE2DMSARRAY
    struct_specifier
    TYPE_NAME
precision_qualifier :
    HIGH_PRECISION
    MEDIUM_PRECISION
    LOW_PRECISION
struct_specifier :
    STRUCT IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE
    STRUCT LEFT_BRACE struct_declaration_list RIGHT_BRACE
struct_declaration_list :
    struct_declaration
    struct_declaration_list struct_declaration
struct_declaration :
    type_specifier struct_declarator_list SEMICOLON
    type_qualifier type_specifier struct_declarator_list SEMICOLON
struct_declarator_list :
    struct_declarator
    struct_declarator_list COMMA struct_declarator
struct_declarator :
    IDENTIFIER
    IDENTIFIER array_specifier
initializer :
    assignment_expression
    LEFT_BRACE initializer_list RIGHT_BRACE
    LEFT_BRACE initializer_list COMMA RIGHT_BRACE
initializer_list :
    initializer
    initializer_list COMMA initializer
declaration_statement :
    declaration
statement :
    compound_statement
    simple_statement
simple_statement :
    declaration_statement
    expression_statement
    selection_statement
    switch_statement
    case_label
    iteration_statement
    jump_statement
compound_statement :
    LEFT_BRACE RIGHT_BRACE
    LEFT_BRACE statement_list RIGHT_BRACE
statement_no_new_scope :
    compound_statement_no_new_scope
    simple_statement
compound_statement_no_new_scope :
    LEFT_BRACE RIGHT_BRACE
    LEFT_BRACE statement_list RIGHT_BRACE
statement_list :
    statement
    statement_list statement
expression_statement :
    SEMICOLON
    expression SEMICOLON
selection_statement :
    IF LEFT_PAREN expression RIGHT_PAREN selection_rest_statement
selection_rest_statement :
    statement ELSE statement
    statement
condition :
    expression
    fully_specified_type IDENTIFIER EQUAL initializer
switch_statement :
    SWITCH LEFT_PAREN expression RIGHT_PAREN LEFT_BRACE switch_statement_list
    RIGHT_BRACE
switch_statement_list :
    /* nothing */
    statement_list
case_label :
    CASE expression COLON
    DEFAULT COLON
iteration_statement :
    WHILE LEFT_PAREN condition RIGHT_PAREN statement_no_new_scope
    DO statement WHILE LEFT_PAREN expression RIGHT_PAREN SEMICOLON
    FOR LEFT_PAREN for_init_statement for_rest_statement RIGHT_PAREN statement_no_new_scope
for_init_statement :
    expression_statement
    declaration_statement
conditionopt :
    condition
    /* empty */
for_rest_statement :
    conditionopt SEMICOLON
    conditionopt SEMICOLON expression
jump_statement :
    CONTINUE SEMICOLON
    BREAK SEMICOLON
    RETURN SEMICOLON
    RETURN expression SEMICOLON
    DISCARD SEMICOLON // Fragment shader only
translation_unit :
    external_declaration
    translation_unit external_declaration
external_declaration :
    function_definition
    declaration
    SEMICOLON
function_definition :
    function_prototype compound_statement_no_new_scope

GLSLangSpec.4.60.lark_ebnf

variable_identifier : IDENTIFIER
primary_expression : variable_identifier
    | INTCONSTANT
    | UINTCONSTANT
    | FLOATCONSTANT
    | BOOLCONSTANT
    | DOUBLECONSTANT
    | LEFT_PAREN expression RIGHT_PAREN
postfix_expression : primary_expression
    | postfix_expression LEFT_BRACKET integer_expression RIGHT_BRACKET
    | function_call
    | postfix_expression DOT FIELD_SELECTION
    | postfix_expression INC_OP
    | postfix_expression DEC_OP
integer_expression : expression
function_call : function_call_or_method
function_call_or_method : function_call_generic
function_call_generic : function_call_header_with_parameters RIGHT_PAREN
    | function_call_header_no_parameters RIGHT_PAREN
function_call_header_no_parameters : function_call_header VOID
    | function_call_header
function_call_header_with_parameters : function_call_header assignment_expression
    | function_call_header_with_parameters COMMA assignment_expression
function_call_header : function_identifier LEFT_PAREN
function_identifier : type_specifier
    | postfix_expression
unary_expression : postfix_expression
    | INC_OP unary_expression
    | DEC_OP unary_expression
    | unary_operator unary_expression
unary_operator : PLUS
    | DASH
    | BANG
    | TILDE
multiplicative_expression : unary_expression
    | multiplicative_expression STAR unary_expression
    | multiplicative_expression SLASH unary_expression
    | multiplicative_expression PERCENT unary_expression
additive_expression : multiplicative_expression
    | additive_expression PLUS multiplicative_expression
    | additive_expression DASH multiplicative_expression
shift_expression : additive_expression
    | shift_expression LEFT_OP additive_expression
    | shift_expression RIGHT_OP additive_expression
relational_expression : shift_expression
    | relational_expression LEFT_ANGLE shift_expression
    | relational_expression RIGHT_ANGLE shift_expression
    | relational_expression LE_OP shift_expression
    | relational_expression GE_OP shift_expression
equality_expression : relational_expression
    | equality_expression EQ_OP relational_expression
    | equality_expression NE_OP relational_expression
and_expression : equality_expression
    | and_expression AMPERSAND equality_expression
exclusive_or_expression : and_expression
    | exclusive_or_expression CARET and_expression
inclusive_or_expression : exclusive_or_expression
    | inclusive_or_expression VERTICAL_BAR exclusive_or_expression
logical_and_expression : inclusive_or_expression
    | logical_and_expression AND_OP inclusive_or_expression
logical_xor_expression : logical_and_expression
    | logical_xor_expression XOR_OP logical_and_expression
logical_or_expression : logical_xor_expression
    | logical_or_expression OR_OP logical_xor_expression
conditional_expression : logical_or_expression
    | logical_or_expression QUESTION expression COLON assignment_expression
assignment_expression : conditional_expression
    | unary_expression assignment_operator assignment_expression
assignment_operator : EQUAL
    | MUL_ASSIGN
    | DIV_ASSIGN
    | MOD_ASSIGN
    | ADD_ASSIGN
    | SUB_ASSIGN
    | LEFT_ASSIGN
    | RIGHT_ASSIGN
    | AND_ASSIGN
    | XOR_ASSIGN
    | OR_ASSIGN
expression : assignment_expression
    | expression COMMA assignment_expression
constant_expression : conditional_expression
declaration : function_prototype SEMICOLON
    | init_declarator_list SEMICOLON
    | PRECISION precision_qualifier type_specifier SEMICOLON
    | type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE SEMICOLON
    | type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
    | SEMICOLON
    | type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
    | array_specifier SEMICOLON
    | type_qualifier SEMICOLON
    | type_qualifier IDENTIFIER SEMICOLON
    | type_qualifier IDENTIFIER identifier_list SEMICOLON
identifier_list : COMMA IDENTIFIER
    | identifier_list COMMA IDENTIFIER
function_prototype : function_declarator RIGHT_PAREN
function_declarator : function_header
    | function_header_with_parameters
function_header_with_parameters : function_header parameter_declaration
    | function_header_with_parameters COMMA parameter_declaration
function_header : fully_specified_type IDENTIFIER LEFT_PAREN
parameter_declarator : type_specifier IDENTIFIER
    | type_specifier IDENTIFIER array_specifier
parameter_declaration : type_qualifier parameter_declarator
    | parameter_declarator
    | type_qualifier parameter_type_specifier
    | parameter_type_specifier
parameter_type_specifier : type_specifier
init_declarator_list : single_declaration
    | init_declarator_list COMMA IDENTIFIER
    | init_declarator_list COMMA IDENTIFIER array_specifier
    | init_declarator_list COMMA IDENTIFIER array_specifier EQUAL initializer
    | init_declarator_list COMMA IDENTIFIER EQUAL initializer
single_declaration : fully_specified_type
    | fully_specified_type IDENTIFIER
    | fully_specified_type IDENTIFIER array_specifier
    | fully_specified_type IDENTIFIER array_specifier EQUAL initializer
    | fully_specified_type IDENTIFIER EQUAL initializer
fully_specified_type : type_specifier
    | type_qualifier type_specifier
invariant_qualifier : INVARIANT
interpolation_qualifier : SMOOTH
    | FLAT
    | NOPERSPECTIVE
layout_qualifier : LAYOUT LEFT_PAREN layout_qualifier_id_list RIGHT_PAREN
layout_qualifier_id_list : layout_qualifier_id
    | layout_qualifier_id_list COMMA layout_qualifier_id
layout_qualifier_id : IDENTIFIER
    | IDENTIFIER EQUAL constant_expression
    | SHARED
precise_qualifier : PRECISE
type_qualifier : single_type_qualifier
    | type_qualifier single_type_qualifier
single_type_qualifier : storage_qualifier
    | layout_qualifier
    | precision_qualifier
    | interpolation_qualifier
    | invariant_qualifier
    | precise_qualifier
storage_qualifier : CONST
    | IN
    | OUT
    | INOUT
    | CENTROID
    | PATCH
    | SAMPLE
    | UNIFORM
    | BUFFER
    | SHARED
    | COHERENT
    | VOLATILE
    | RESTRICT
    | READONLY
    | WRITEONLY
    | SUBROUTINE
    | SUBROUTINE LEFT_PAREN type_name_list RIGHT_PAREN
type_name_list : TYPE_NAME
    | type_name_list COMMA TYPE_NAME
type_specifier : type_specifier_nonarray
    | type_specifier_nonarray array_specifier
array_specifier : LEFT_BRACKET RIGHT_BRACKET
    | LEFT_BRACKET constant_expression RIGHT_BRACKET
    | array_specifier LEFT_BRACKET RIGHT_BRACKET
    | array_specifier LEFT_BRACKET constant_expression RIGHT_BRACKET
type_specifier_nonarray : VOID
    | FLOAT
    | DOUBLE
    | INT
    | UINT
    | BOOL
    | VEC2
    | VEC3
    | VEC4
    | DVEC2
    | DVEC3
    | DVEC4
    | BVEC2
    | BVEC3
    | BVEC4
    | IVEC2
    | IVEC3
    | IVEC4
    | UVEC2
    | UVEC3
    | UVEC4
    | MAT2
    | MAT3
    | MAT4
    | MAT2X2
    | MAT2X3
    | MAT2X4
    | MAT3X2
    | MAT3X3
    | MAT3X4
    | MAT4X2
    | MAT4X3
    | MAT4X4
    | DMAT2
    | DMAT3
    | DMAT4
    | DMAT2X2
    | DMAT2X3
    | DMAT2X4
    | DMAT3X2
    | DMAT3X3
    | DMAT3X4
    | DMAT4X2
    | DMAT4X3
    | DMAT4X4
    | ATOMIC_UINT
    | SAMPLER2D
    | SAMPLER3D
    | SAMPLERCUBE
    | SAMPLER2DSHADOW
    | SAMPLERCUBESHADOW
    | SAMPLER2DARRAY
    | SAMPLER2DARRAYSHADOW
    | SAMPLERCUBEARRAY
    | SAMPLERCUBEARRAYSHADOW
    | ISAMPLER2D
    | ISAMPLER3D
    | ISAMPLERCUBE
    | ISAMPLER2DARRAY
    | ISAMPLERCUBEARRAY
    | USAMPLER2D
    | USAMPLER3D
    | USAMPLERCUBE
    | USAMPLER2DARRAY
    | USAMPLERCUBEARRAY
    | SAMPLER1D
    | SAMPLER1DSHADOW
    | SAMPLER1DARRAY
    | SAMPLER1DARRAYSHADOW
    | ISAMPLER1D
    | ISAMPLER1DARRAY
    | USAMPLER1D
    | USAMPLER1DARRAY
    | SAMPLER2DRECT
    | SAMPLER2DRECTSHADOW
    | ISAMPLER2DRECT
    | USAMPLER2DRECT
    | SAMPLERBUFFER
    | ISAMPLERBUFFER
    | USAMPLERBUFFER
    | SAMPLER2DMS
    | ISAMPLER2DMS
    | USAMPLER2DMS
    | SAMPLER2DMSARRAY
    | ISAMPLER2DMSARRAY
    | USAMPLER2DMSARRAY
    | IMAGE2D
    | IIMAGE2D
    | UIMAGE2D
    | IMAGE3D
    | IIMAGE3D
    | UIMAGE3D
    | IMAGECUBE
    | IIMAGECUBE
    | UIMAGECUBE
    | IMAGEBUFFER
    | IIMAGEBUFFER
    | UIMAGEBUFFER
    | IMAGE1D
    | IIMAGE1D
    | UIMAGE1D
    | IMAGE1DARRAY
    | IIMAGE1DARRAY
    | UIMAGE1DARRAY
    | IMAGE2DRECT
    | IIMAGE2DRECT
    | UIMAGE2DRECT
    | IMAGE2DARRAY
    | IIMAGE2DARRAY
    | UIMAGE2DARRAY
    | IMAGECUBEARRAY
    | IIMAGECUBEARRAY
    | UIMAGECUBEARRAY
    | IMAGE2DMS
    | IIMAGE2DMS
    | UIMAGE2DMS
    | IMAGE2DMSARRAY
    | IIMAGE2DMSARRAY
    | UIMAGE2DMSARRAY
    | struct_specifier
    | TYPE_NAME
precision_qualifier : HIGH_PRECISION
    | MEDIUM_PRECISION
    | LOW_PRECISION
struct_specifier : STRUCT IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE
    | STRUCT LEFT_BRACE struct_declaration_list RIGHT_BRACE
struct_declaration_list : struct_declaration
    | struct_declaration_list struct_declaration
struct_declaration : type_specifier struct_declarator_list SEMICOLON
    | type_qualifier type_specifier struct_declarator_list SEMICOLON
struct_declarator_list : struct_declarator
    | struct_declarator_list COMMA struct_declarator
struct_declarator : IDENTIFIER
    | IDENTIFIER array_specifier
initializer : assignment_expression
    | LEFT_BRACE initializer_list RIGHT_BRACE
    | LEFT_BRACE initializer_list COMMA RIGHT_BRACE
initializer_list : initializer
    | initializer_list COMMA initializer
declaration_statement : declaration
statement : compound_statement
    | simple_statement
simple_statement : declaration_statement
    | expression_statement
    | selection_statement
    | switch_statement
    | case_label
    | iteration_statement
    | jump_statement
compound_statement : LEFT_BRACE RIGHT_BRACE
    | LEFT_BRACE statement_list RIGHT_BRACE
statement_no_new_scope : compound_statement_no_new_scope
    | simple_statement
compound_statement_no_new_scope : LEFT_BRACE RIGHT_BRACE
    | LEFT_BRACE statement_list RIGHT_BRACE
statement_list : statement
    | statement_list statement
expression_statement : SEMICOLON
    | expression SEMICOLON
selection_statement : IF LEFT_PAREN expression RIGHT_PAREN selection_rest_statement
selection_rest_statement : statement ELSE statement
    | statement
condition : expression
    | fully_specified_type IDENTIFIER EQUAL initializer
switch_statement : SWITCH LEFT_PAREN expression RIGHT_PAREN LEFT_BRACE switch_statement_list
    | RIGHT_BRACE
switch_statement_list : /* nothing */ 
    | statement_list
case_label : CASE expression COLON
    | DEFAULT COLON
iteration_statement : WHILE LEFT_PAREN condition RIGHT_PAREN statement_no_new_scope
    | DO statement WHILE LEFT_PAREN expression RIGHT_PAREN SEMICOLON
    | FOR LEFT_PAREN for_init_statement for_rest_statement RIGHT_PAREN statement_no_new_scope
for_init_statement : expression_statement
    | declaration_statement
conditionopt : condition
    | /* empty */
for_rest_statement : conditionopt SEMICOLON
    | conditionopt SEMICOLON expression
jump_statement : CONTINUE SEMICOLON
    | BREAK SEMICOLON
    | RETURN SEMICOLON
    | RETURN expression SEMICOLON
    | DISCARD SEMICOLON // Fragment shader only
translation_unit : external_declaration
    | translation_unit external_declaration
external_declaration : function_definition
    | declaration
    | SEMICOLON
function_definition : function_prototype compound_statement_no_new_scope

Now, you tell me, which one is easier to understand and work with (improving, tweaking, ...)?

You can also see a ST side by side comparison here

But I'm worried that it's not clear enough that each line is a different option.

@erezsh : Well, my background is a lot of years coding on c/c++ and when I started using python many years ago I thought I wouldn't survive without braces... nowadays is the other way around, each time i see redundant tokens like braces I become sad ;)

@whitten : Just for the record, I think that particular syntax is the worst from the whole set of proposals (worst=less elegant and less cleaner)

NS: I haven't checked correctness (just a fast adaptation), so not sure if you'd transform this:

conditionopt :
    condition
    /* empty */

like:

conditionopt : condition
    | /* empty */

nor this:

switch_statement_list :
    /* nothing */
    statement_list

like:

switch_statement_list : /* nothing */ 
   | statement_list

guess that's just wrong syntax...

brupelo avatar May 26 '18 14:05 brupelo

Ideally, you'd be able to express the same rule on different ways, for instance:

flow_stmt1: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt

flow_stmt2: 
    break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt

flow_stmt3: 
    break_stmt
    continue_stmt
    return_stmt
    raise_stmt
    yield_stmt

flow_stmt1==flow_stmt2==flow_stmt3.

The idea here is you're giving the user the freedom to create hacky small compact grammars if he wants to do so or... more clean ones (even if a little bit more verbose one)

brupelo avatar May 26 '18 15:05 brupelo

Yep, that's what I was planning.

erezsh avatar May 26 '18 17:05 erezsh

I suggest one more way how to write multiple right-sides of the rule.

foo: bar1 | bar2

foo: bar1
foo: bar2

The main reason for such format (currently, it is rejected because of the duplicity of the left-side) is that you can easilly add comments with explanation for every line. I already have a (naive) preprocessor for that.

marxsk avatar Apr 08 '20 16:04 marxsk

@marxsk Line comments are already possible

>>> from lark import Lark
>>> p=Lark("""
...				// Comment
...     start: "a"
...             // Another comment
...          | "b"
...             // And C
...          | "c"
... """)
>>>

erezsh avatar Apr 08 '20 19:04 erezsh

Could the lark grammar be reimplemented in Lark?

gideongrinberg avatar Jun 03 '21 00:06 gideongrinberg

@gideongrinberg It kinda is. We have a mirror that should match exactly what the actual parser accepts: lark.lark, but that is not what is being used internally. That is still being parsed with the lalr parser, but the rules are encoded here

MegaIng avatar Jun 03 '21 06:06 MegaIng

I just had a crazy thought. What if Lark accepted a grammar_grammar argument, which would describe the syntax for the grammar? (it should work as long as the structure is the same as lark.lark)

erezsh avatar Jun 03 '21 08:06 erezsh

@erezsh That actually sounds like a good idea. I would suggest that the argument takes a Lark instance (e.g. Something that has a .parse(str) -> Tree method). This would allow the grammar to use a Transformer to fix things that don't exactly match between EBNF syntax. This system would allow us to easily use EBNF grammars in a different syntax. (The current built-in grammar would have to stay ofcourse)

MegaIng avatar Jun 03 '21 09:06 MegaIng

@MegaIng It would be interesting to try! Though seems like it's just a single function, parse(str) -> Tree.

So passing grammar_parser = Lark.open('lark.lark', ...).parse will have no effect, other than extra processing.

It will require a bit of work, because currently there is a deviation of structure between lark.lark and the native grammar loader.

erezsh avatar Jun 03 '21 10:06 erezsh

I just had a crazy thought. What if Lark accepted a grammar_grammar argument, which would describe the syntax for the grammar? (it should work as long as the structure is the same as lark.lark)

We need standards. That's probably a bad idea in case I wanted to read other's code

BUT it could be a good idea because it allows people who don't know Lark's grammar, but instead something like ANTLR, to be able to write a grammar. We could have a libraries of possible default grammar-grammar-grammars.

ThatXliner avatar Jun 03 '21 16:06 ThatXliner

@ThatXliner Well, if they produce a Tree that corresponds to lark.lark, I imagine the reconstructor should be able to automatically generate a working lark grammar. In theory, at least.

But I agree that it might become confusing, if suddently everyone used their own syntax.

erezsh avatar Jun 03 '21 16:06 erezsh

I'm concerned that the nonterminal : | terminal | nonterminal ;

Since "[someterminal]" means that the someterminal is optional, perhaps if we use the syntax [] to mean an optional "empty".

I also think that at times it is easier to write the grammar with rules where one of the alternatives is "empty". However, I would much prefer that common.lark declare an EMPTY terminal. It could even be managed as a special case when processing a grammar. (I am pretty sure that EMPTY: "" won't work.

julie777 avatar Apr 04 '22 17:04 julie777

@brupelo Just my 2 cents about the format of the alternatives that I haven't seen mentioned explicitly.

rule: SOME_TERMINAL some_rule some_other_rule ANOTHER_TERMINAL
        | TERMINAL3

Is actually implying parenthesis by using the line break. The above is actually.

rule: (SOME_TERMINAL some_rule some_other_rule ANOTHER_TERMINAL)     | TERMINAL3

I can understand the thinking that following the rule declaration having the rest of the line empty means that the definition of the rule is an indented block. This is very YAML-like.

Note: I really don't like the ANTLR : | ; notation and I used ANTLER before it was converted to JAVA.

Note: about notation, regardless of the multiline format used to define a rule I think it is imperative that the colon always be used to indicate the preceding string is either a rule or terminal.

If it is true that currently having nothing on the line after a rule declaration is an error then allowing the

grammar = '''
    rule : 
        rule1
        rule2
'''

format could be okay. I think that being explicit about what the alternatives are

   rule:
            | rule1
            | rule2

matches the YAML list construct where list item starts with "- " and in this case a rule alternative would start with '| ". To me that is a pretty clear meaning where each alternative is listed on a single line (which implies parenthesis) and has a preceeding indent followed by the alternative tag "| ". It would be one more extension to EBNF that Lark adds.

If the above was added along with a predefined EMPTY terminal and documentation reflected the use of EMPTY when that is what you mean then having nothing after the colon on the line shouldn't be confusing.

Now if Lark supported inline rule definitions (even more like YAML) I would not be happy.

// this would be very bad   :-)
rule:
    | rule2: TERMINAL1 TERMINAL2
    | rule3: TERMINAL3 rule2

julie777 avatar Apr 04 '22 18:04 julie777