lark
lark copied to clipboard
Cleaner EBNF grammar
Would it be possible to modify the current ebnf grammar so instead the current syntax where you're forced to have the first rule and colon after the rule name on the same line:
grammar = '''
rule : rule1
| rule2
'''
you could have indented and clean blocks like this (fornmat used on glsl specs):
grammar = '''
rule :
rule1
rule2
'''
or maybe (not very clean one):
grammar = '''
rule :
| rule1
| rule2
'''
or:
grammar = '''
rule
: rule1
| rule2
'''
or (inspired from Antlr4 ):
grammar = '''
rule
: rule1
| rule2
;
'''
Rationale: That way the EBNF grammar will become much more readable and not only that, you'll be able to fold long grammars easily on your favourite text editor because the grammar now has proper indentation, example here.
Guess it's a matter to tweak a bit this file, even if you don't like the idea, could you explain how you'd do so?
Thanks.
I like these versions: 1.
rule:
rule1
rule2
rule
: rule1
| rule2
The first one is cleaner and the positions of the alternatives are easy to rearrange. But I'm worried that it's not clear enough that each line is a different option.
The second one is a little bit weird, but it might work.
I'll have to give it some thought, to make sure it won't collide with other concepts.
Yeah, I do agree, the first one is the "optimal" one from the whole set of EBNF grammars (as it doesn't contain redundant/verbose elements, it's the most pythonic one :))
Also, did you look this repo, it contains a lot of ebnf grammars ready to go... it'd be cool if they could be used out of the box with lark, or maybe converting them automatically to lark... Guess adding more additional arguments to the Lark constructor to specify which type of ebnf you're dealing with would be ugly.
I mean, I guess this is some sort of tradeoff... usually you want your functions/constructor/ui/gui to be as minimal as possible so they become clear as water for users, quoting:
The ideal numbers of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification ‐ and then shouldn't be used anyway.
Anyway, just give it some thoughts, I like to bring to the table new use-cases or improvements about usability ;)
B.
NS: When I said "converting them" automatically I meant maybe creating some sort of script (without modifying the lark core so the code doesn't become more complex without any real reason)
it'd be cool if they could be used out of the box with lark, or maybe converting them automatically to lark
Yes, that's a good idea. I'd say converting them is the way to go. However, Not all of them can be converted; many of them require code that resides in the grammar to work correctly.
The conversion script can be added to lark.tools.
I'm concerned that the nonterminal : | terminal | nonterminal ;
could be understood as an "empty" Right Hand Side but I think we should have some way of being explicit that the right hand side matches "nothing" or no input. Since "[someterminal]" means that the someterminal is optional, perhaps if we use the syntax [] to mean an optional "empty".
Guys, just for the sake of making my point clearer (sometimes the best way to prove something is by presenting visual samples) I want you to take a look to the below comparison between the syntax used here and the lark one:
GLSLangSpec.4.60.original_ebnf
variable_identifier :
IDENTIFIER
primary_expression :
variable_identifier
INTCONSTANT
UINTCONSTANT
FLOATCONSTANT
BOOLCONSTANT
DOUBLECONSTANT
LEFT_PAREN expression RIGHT_PAREN
postfix_expression :
primary_expression
postfix_expression LEFT_BRACKET integer_expression RIGHT_BRACKET
function_call
postfix_expression DOT FIELD_SELECTION
postfix_expression INC_OP
postfix_expression DEC_OP
integer_expression :
expression
function_call :
function_call_or_method
function_call_or_method :
function_call_generic
function_call_generic :
function_call_header_with_parameters RIGHT_PAREN
function_call_header_no_parameters RIGHT_PAREN
function_call_header_no_parameters :
function_call_header VOID
function_call_header
function_call_header_with_parameters :
function_call_header assignment_expression
function_call_header_with_parameters COMMA assignment_expression
function_call_header :
function_identifier LEFT_PAREN
function_identifier :
type_specifier
postfix_expression
unary_expression :
postfix_expression
INC_OP unary_expression
DEC_OP unary_expression
unary_operator unary_expression
unary_operator :
PLUS
DASH
BANG
TILDE
multiplicative_expression :
unary_expression
multiplicative_expression STAR unary_expression
multiplicative_expression SLASH unary_expression
multiplicative_expression PERCENT unary_expression
additive_expression :
multiplicative_expression
additive_expression PLUS multiplicative_expression
additive_expression DASH multiplicative_expression
shift_expression :
additive_expression
shift_expression LEFT_OP additive_expression
shift_expression RIGHT_OP additive_expression
relational_expression :
shift_expression
relational_expression LEFT_ANGLE shift_expression
relational_expression RIGHT_ANGLE shift_expression
relational_expression LE_OP shift_expression
relational_expression GE_OP shift_expression
equality_expression :
relational_expression
equality_expression EQ_OP relational_expression
equality_expression NE_OP relational_expression
and_expression :
equality_expression
and_expression AMPERSAND equality_expression
exclusive_or_expression :
and_expression
exclusive_or_expression CARET and_expression
inclusive_or_expression :
exclusive_or_expression
inclusive_or_expression VERTICAL_BAR exclusive_or_expression
logical_and_expression :
inclusive_or_expression
logical_and_expression AND_OP inclusive_or_expression
logical_xor_expression :
logical_and_expression
logical_xor_expression XOR_OP logical_and_expression
logical_or_expression :
logical_xor_expression
logical_or_expression OR_OP logical_xor_expression
conditional_expression :
logical_or_expression
logical_or_expression QUESTION expression COLON assignment_expression
assignment_expression :
conditional_expression
unary_expression assignment_operator assignment_expression
assignment_operator :
EQUAL
MUL_ASSIGN
DIV_ASSIGN
MOD_ASSIGN
ADD_ASSIGN
SUB_ASSIGN
LEFT_ASSIGN
RIGHT_ASSIGN
AND_ASSIGN
XOR_ASSIGN
OR_ASSIGN
expression :
assignment_expression
expression COMMA assignment_expression
constant_expression :
conditional_expression
declaration :
function_prototype SEMICOLON
init_declarator_list SEMICOLON
PRECISION precision_qualifier type_specifier SEMICOLON
type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE SEMICOLON
type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
SEMICOLON
type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
array_specifier SEMICOLON
type_qualifier SEMICOLON
type_qualifier IDENTIFIER SEMICOLON
type_qualifier IDENTIFIER identifier_list SEMICOLON
identifier_list :
COMMA IDENTIFIER
identifier_list COMMA IDENTIFIER
function_prototype :
function_declarator RIGHT_PAREN
function_declarator :
function_header
function_header_with_parameters
function_header_with_parameters :
function_header parameter_declaration
function_header_with_parameters COMMA parameter_declaration
function_header :
fully_specified_type IDENTIFIER LEFT_PAREN
parameter_declarator :
type_specifier IDENTIFIER
type_specifier IDENTIFIER array_specifier
parameter_declaration :
type_qualifier parameter_declarator
parameter_declarator
type_qualifier parameter_type_specifier
parameter_type_specifier
parameter_type_specifier :
type_specifier
init_declarator_list :
single_declaration
init_declarator_list COMMA IDENTIFIER
init_declarator_list COMMA IDENTIFIER array_specifier
init_declarator_list COMMA IDENTIFIER array_specifier EQUAL initializer
init_declarator_list COMMA IDENTIFIER EQUAL initializer
single_declaration :
fully_specified_type
fully_specified_type IDENTIFIER
fully_specified_type IDENTIFIER array_specifier
fully_specified_type IDENTIFIER array_specifier EQUAL initializer
fully_specified_type IDENTIFIER EQUAL initializer
fully_specified_type :
type_specifier
type_qualifier type_specifier
invariant_qualifier :
INVARIANT
interpolation_qualifier :
SMOOTH
FLAT
NOPERSPECTIVE
layout_qualifier :
LAYOUT LEFT_PAREN layout_qualifier_id_list RIGHT_PAREN
layout_qualifier_id_list :
layout_qualifier_id
layout_qualifier_id_list COMMA layout_qualifier_id
layout_qualifier_id :
IDENTIFIER
IDENTIFIER EQUAL constant_expression
SHARED
precise_qualifier :
PRECISE
type_qualifier :
single_type_qualifier
type_qualifier single_type_qualifier
single_type_qualifier :
storage_qualifier
layout_qualifier
precision_qualifier
interpolation_qualifier
invariant_qualifier
precise_qualifier
storage_qualifier :
CONST
IN
OUT
INOUT
CENTROID
PATCH
SAMPLE
UNIFORM
BUFFER
SHARED
COHERENT
VOLATILE
RESTRICT
READONLY
WRITEONLY
SUBROUTINE
SUBROUTINE LEFT_PAREN type_name_list RIGHT_PAREN
type_name_list :
TYPE_NAME
type_name_list COMMA TYPE_NAME
type_specifier :
type_specifier_nonarray
type_specifier_nonarray array_specifier
array_specifier :
LEFT_BRACKET RIGHT_BRACKET
LEFT_BRACKET constant_expression RIGHT_BRACKET
array_specifier LEFT_BRACKET RIGHT_BRACKET
array_specifier LEFT_BRACKET constant_expression RIGHT_BRACKET
type_specifier_nonarray :
VOID
FLOAT
DOUBLE
INT
UINT
BOOL
VEC2
VEC3
VEC4
DVEC2
DVEC3
DVEC4
BVEC2
BVEC3
BVEC4
IVEC2
IVEC3
IVEC4
UVEC2
UVEC3
UVEC4
MAT2
MAT3
MAT4
MAT2X2
MAT2X3
MAT2X4
MAT3X2
MAT3X3
MAT3X4
MAT4X2
MAT4X3
MAT4X4
DMAT2
DMAT3
DMAT4
DMAT2X2
DMAT2X3
DMAT2X4
DMAT3X2
DMAT3X3
DMAT3X4
DMAT4X2
DMAT4X3
DMAT4X4
ATOMIC_UINT
SAMPLER2D
SAMPLER3D
SAMPLERCUBE
SAMPLER2DSHADOW
SAMPLERCUBESHADOW
SAMPLER2DARRAY
SAMPLER2DARRAYSHADOW
SAMPLERCUBEARRAY
SAMPLERCUBEARRAYSHADOW
ISAMPLER2D
ISAMPLER3D
ISAMPLERCUBE
ISAMPLER2DARRAY
ISAMPLERCUBEARRAY
USAMPLER2D
USAMPLER3D
USAMPLERCUBE
USAMPLER2DARRAY
USAMPLERCUBEARRAY
SAMPLER1D
SAMPLER1DSHADOW
SAMPLER1DARRAY
SAMPLER1DARRAYSHADOW
ISAMPLER1D
ISAMPLER1DARRAY
USAMPLER1D
USAMPLER1DARRAY
SAMPLER2DRECT
SAMPLER2DRECTSHADOW
ISAMPLER2DRECT
USAMPLER2DRECT
SAMPLERBUFFER
ISAMPLERBUFFER
USAMPLERBUFFER
SAMPLER2DMS
ISAMPLER2DMS
USAMPLER2DMS
SAMPLER2DMSARRAY
ISAMPLER2DMSARRAY
USAMPLER2DMSARRAY
IMAGE2D
IIMAGE2D
UIMAGE2D
IMAGE3D
IIMAGE3D
UIMAGE3D
IMAGECUBE
IIMAGECUBE
UIMAGECUBE
IMAGEBUFFER
IIMAGEBUFFER
UIMAGEBUFFER
IMAGE1D
IIMAGE1D
UIMAGE1D
IMAGE1DARRAY
IIMAGE1DARRAY
UIMAGE1DARRAY
IMAGE2DRECT
IIMAGE2DRECT
UIMAGE2DRECT
IMAGE2DARRAY
IIMAGE2DARRAY
UIMAGE2DARRAY
IMAGECUBEARRAY
IIMAGECUBEARRAY
UIMAGECUBEARRAY
IMAGE2DMS
IIMAGE2DMS
UIMAGE2DMS
IMAGE2DMSARRAY
IIMAGE2DMSARRAY
UIMAGE2DMSARRAY
struct_specifier
TYPE_NAME
precision_qualifier :
HIGH_PRECISION
MEDIUM_PRECISION
LOW_PRECISION
struct_specifier :
STRUCT IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE
STRUCT LEFT_BRACE struct_declaration_list RIGHT_BRACE
struct_declaration_list :
struct_declaration
struct_declaration_list struct_declaration
struct_declaration :
type_specifier struct_declarator_list SEMICOLON
type_qualifier type_specifier struct_declarator_list SEMICOLON
struct_declarator_list :
struct_declarator
struct_declarator_list COMMA struct_declarator
struct_declarator :
IDENTIFIER
IDENTIFIER array_specifier
initializer :
assignment_expression
LEFT_BRACE initializer_list RIGHT_BRACE
LEFT_BRACE initializer_list COMMA RIGHT_BRACE
initializer_list :
initializer
initializer_list COMMA initializer
declaration_statement :
declaration
statement :
compound_statement
simple_statement
simple_statement :
declaration_statement
expression_statement
selection_statement
switch_statement
case_label
iteration_statement
jump_statement
compound_statement :
LEFT_BRACE RIGHT_BRACE
LEFT_BRACE statement_list RIGHT_BRACE
statement_no_new_scope :
compound_statement_no_new_scope
simple_statement
compound_statement_no_new_scope :
LEFT_BRACE RIGHT_BRACE
LEFT_BRACE statement_list RIGHT_BRACE
statement_list :
statement
statement_list statement
expression_statement :
SEMICOLON
expression SEMICOLON
selection_statement :
IF LEFT_PAREN expression RIGHT_PAREN selection_rest_statement
selection_rest_statement :
statement ELSE statement
statement
condition :
expression
fully_specified_type IDENTIFIER EQUAL initializer
switch_statement :
SWITCH LEFT_PAREN expression RIGHT_PAREN LEFT_BRACE switch_statement_list
RIGHT_BRACE
switch_statement_list :
/* nothing */
statement_list
case_label :
CASE expression COLON
DEFAULT COLON
iteration_statement :
WHILE LEFT_PAREN condition RIGHT_PAREN statement_no_new_scope
DO statement WHILE LEFT_PAREN expression RIGHT_PAREN SEMICOLON
FOR LEFT_PAREN for_init_statement for_rest_statement RIGHT_PAREN statement_no_new_scope
for_init_statement :
expression_statement
declaration_statement
conditionopt :
condition
/* empty */
for_rest_statement :
conditionopt SEMICOLON
conditionopt SEMICOLON expression
jump_statement :
CONTINUE SEMICOLON
BREAK SEMICOLON
RETURN SEMICOLON
RETURN expression SEMICOLON
DISCARD SEMICOLON // Fragment shader only
translation_unit :
external_declaration
translation_unit external_declaration
external_declaration :
function_definition
declaration
SEMICOLON
function_definition :
function_prototype compound_statement_no_new_scope
GLSLangSpec.4.60.lark_ebnf
variable_identifier : IDENTIFIER
primary_expression : variable_identifier
| INTCONSTANT
| UINTCONSTANT
| FLOATCONSTANT
| BOOLCONSTANT
| DOUBLECONSTANT
| LEFT_PAREN expression RIGHT_PAREN
postfix_expression : primary_expression
| postfix_expression LEFT_BRACKET integer_expression RIGHT_BRACKET
| function_call
| postfix_expression DOT FIELD_SELECTION
| postfix_expression INC_OP
| postfix_expression DEC_OP
integer_expression : expression
function_call : function_call_or_method
function_call_or_method : function_call_generic
function_call_generic : function_call_header_with_parameters RIGHT_PAREN
| function_call_header_no_parameters RIGHT_PAREN
function_call_header_no_parameters : function_call_header VOID
| function_call_header
function_call_header_with_parameters : function_call_header assignment_expression
| function_call_header_with_parameters COMMA assignment_expression
function_call_header : function_identifier LEFT_PAREN
function_identifier : type_specifier
| postfix_expression
unary_expression : postfix_expression
| INC_OP unary_expression
| DEC_OP unary_expression
| unary_operator unary_expression
unary_operator : PLUS
| DASH
| BANG
| TILDE
multiplicative_expression : unary_expression
| multiplicative_expression STAR unary_expression
| multiplicative_expression SLASH unary_expression
| multiplicative_expression PERCENT unary_expression
additive_expression : multiplicative_expression
| additive_expression PLUS multiplicative_expression
| additive_expression DASH multiplicative_expression
shift_expression : additive_expression
| shift_expression LEFT_OP additive_expression
| shift_expression RIGHT_OP additive_expression
relational_expression : shift_expression
| relational_expression LEFT_ANGLE shift_expression
| relational_expression RIGHT_ANGLE shift_expression
| relational_expression LE_OP shift_expression
| relational_expression GE_OP shift_expression
equality_expression : relational_expression
| equality_expression EQ_OP relational_expression
| equality_expression NE_OP relational_expression
and_expression : equality_expression
| and_expression AMPERSAND equality_expression
exclusive_or_expression : and_expression
| exclusive_or_expression CARET and_expression
inclusive_or_expression : exclusive_or_expression
| inclusive_or_expression VERTICAL_BAR exclusive_or_expression
logical_and_expression : inclusive_or_expression
| logical_and_expression AND_OP inclusive_or_expression
logical_xor_expression : logical_and_expression
| logical_xor_expression XOR_OP logical_and_expression
logical_or_expression : logical_xor_expression
| logical_or_expression OR_OP logical_xor_expression
conditional_expression : logical_or_expression
| logical_or_expression QUESTION expression COLON assignment_expression
assignment_expression : conditional_expression
| unary_expression assignment_operator assignment_expression
assignment_operator : EQUAL
| MUL_ASSIGN
| DIV_ASSIGN
| MOD_ASSIGN
| ADD_ASSIGN
| SUB_ASSIGN
| LEFT_ASSIGN
| RIGHT_ASSIGN
| AND_ASSIGN
| XOR_ASSIGN
| OR_ASSIGN
expression : assignment_expression
| expression COMMA assignment_expression
constant_expression : conditional_expression
declaration : function_prototype SEMICOLON
| init_declarator_list SEMICOLON
| PRECISION precision_qualifier type_specifier SEMICOLON
| type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE SEMICOLON
| type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
| SEMICOLON
| type_qualifier IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE IDENTIFIER
| array_specifier SEMICOLON
| type_qualifier SEMICOLON
| type_qualifier IDENTIFIER SEMICOLON
| type_qualifier IDENTIFIER identifier_list SEMICOLON
identifier_list : COMMA IDENTIFIER
| identifier_list COMMA IDENTIFIER
function_prototype : function_declarator RIGHT_PAREN
function_declarator : function_header
| function_header_with_parameters
function_header_with_parameters : function_header parameter_declaration
| function_header_with_parameters COMMA parameter_declaration
function_header : fully_specified_type IDENTIFIER LEFT_PAREN
parameter_declarator : type_specifier IDENTIFIER
| type_specifier IDENTIFIER array_specifier
parameter_declaration : type_qualifier parameter_declarator
| parameter_declarator
| type_qualifier parameter_type_specifier
| parameter_type_specifier
parameter_type_specifier : type_specifier
init_declarator_list : single_declaration
| init_declarator_list COMMA IDENTIFIER
| init_declarator_list COMMA IDENTIFIER array_specifier
| init_declarator_list COMMA IDENTIFIER array_specifier EQUAL initializer
| init_declarator_list COMMA IDENTIFIER EQUAL initializer
single_declaration : fully_specified_type
| fully_specified_type IDENTIFIER
| fully_specified_type IDENTIFIER array_specifier
| fully_specified_type IDENTIFIER array_specifier EQUAL initializer
| fully_specified_type IDENTIFIER EQUAL initializer
fully_specified_type : type_specifier
| type_qualifier type_specifier
invariant_qualifier : INVARIANT
interpolation_qualifier : SMOOTH
| FLAT
| NOPERSPECTIVE
layout_qualifier : LAYOUT LEFT_PAREN layout_qualifier_id_list RIGHT_PAREN
layout_qualifier_id_list : layout_qualifier_id
| layout_qualifier_id_list COMMA layout_qualifier_id
layout_qualifier_id : IDENTIFIER
| IDENTIFIER EQUAL constant_expression
| SHARED
precise_qualifier : PRECISE
type_qualifier : single_type_qualifier
| type_qualifier single_type_qualifier
single_type_qualifier : storage_qualifier
| layout_qualifier
| precision_qualifier
| interpolation_qualifier
| invariant_qualifier
| precise_qualifier
storage_qualifier : CONST
| IN
| OUT
| INOUT
| CENTROID
| PATCH
| SAMPLE
| UNIFORM
| BUFFER
| SHARED
| COHERENT
| VOLATILE
| RESTRICT
| READONLY
| WRITEONLY
| SUBROUTINE
| SUBROUTINE LEFT_PAREN type_name_list RIGHT_PAREN
type_name_list : TYPE_NAME
| type_name_list COMMA TYPE_NAME
type_specifier : type_specifier_nonarray
| type_specifier_nonarray array_specifier
array_specifier : LEFT_BRACKET RIGHT_BRACKET
| LEFT_BRACKET constant_expression RIGHT_BRACKET
| array_specifier LEFT_BRACKET RIGHT_BRACKET
| array_specifier LEFT_BRACKET constant_expression RIGHT_BRACKET
type_specifier_nonarray : VOID
| FLOAT
| DOUBLE
| INT
| UINT
| BOOL
| VEC2
| VEC3
| VEC4
| DVEC2
| DVEC3
| DVEC4
| BVEC2
| BVEC3
| BVEC4
| IVEC2
| IVEC3
| IVEC4
| UVEC2
| UVEC3
| UVEC4
| MAT2
| MAT3
| MAT4
| MAT2X2
| MAT2X3
| MAT2X4
| MAT3X2
| MAT3X3
| MAT3X4
| MAT4X2
| MAT4X3
| MAT4X4
| DMAT2
| DMAT3
| DMAT4
| DMAT2X2
| DMAT2X3
| DMAT2X4
| DMAT3X2
| DMAT3X3
| DMAT3X4
| DMAT4X2
| DMAT4X3
| DMAT4X4
| ATOMIC_UINT
| SAMPLER2D
| SAMPLER3D
| SAMPLERCUBE
| SAMPLER2DSHADOW
| SAMPLERCUBESHADOW
| SAMPLER2DARRAY
| SAMPLER2DARRAYSHADOW
| SAMPLERCUBEARRAY
| SAMPLERCUBEARRAYSHADOW
| ISAMPLER2D
| ISAMPLER3D
| ISAMPLERCUBE
| ISAMPLER2DARRAY
| ISAMPLERCUBEARRAY
| USAMPLER2D
| USAMPLER3D
| USAMPLERCUBE
| USAMPLER2DARRAY
| USAMPLERCUBEARRAY
| SAMPLER1D
| SAMPLER1DSHADOW
| SAMPLER1DARRAY
| SAMPLER1DARRAYSHADOW
| ISAMPLER1D
| ISAMPLER1DARRAY
| USAMPLER1D
| USAMPLER1DARRAY
| SAMPLER2DRECT
| SAMPLER2DRECTSHADOW
| ISAMPLER2DRECT
| USAMPLER2DRECT
| SAMPLERBUFFER
| ISAMPLERBUFFER
| USAMPLERBUFFER
| SAMPLER2DMS
| ISAMPLER2DMS
| USAMPLER2DMS
| SAMPLER2DMSARRAY
| ISAMPLER2DMSARRAY
| USAMPLER2DMSARRAY
| IMAGE2D
| IIMAGE2D
| UIMAGE2D
| IMAGE3D
| IIMAGE3D
| UIMAGE3D
| IMAGECUBE
| IIMAGECUBE
| UIMAGECUBE
| IMAGEBUFFER
| IIMAGEBUFFER
| UIMAGEBUFFER
| IMAGE1D
| IIMAGE1D
| UIMAGE1D
| IMAGE1DARRAY
| IIMAGE1DARRAY
| UIMAGE1DARRAY
| IMAGE2DRECT
| IIMAGE2DRECT
| UIMAGE2DRECT
| IMAGE2DARRAY
| IIMAGE2DARRAY
| UIMAGE2DARRAY
| IMAGECUBEARRAY
| IIMAGECUBEARRAY
| UIMAGECUBEARRAY
| IMAGE2DMS
| IIMAGE2DMS
| UIMAGE2DMS
| IMAGE2DMSARRAY
| IIMAGE2DMSARRAY
| UIMAGE2DMSARRAY
| struct_specifier
| TYPE_NAME
precision_qualifier : HIGH_PRECISION
| MEDIUM_PRECISION
| LOW_PRECISION
struct_specifier : STRUCT IDENTIFIER LEFT_BRACE struct_declaration_list RIGHT_BRACE
| STRUCT LEFT_BRACE struct_declaration_list RIGHT_BRACE
struct_declaration_list : struct_declaration
| struct_declaration_list struct_declaration
struct_declaration : type_specifier struct_declarator_list SEMICOLON
| type_qualifier type_specifier struct_declarator_list SEMICOLON
struct_declarator_list : struct_declarator
| struct_declarator_list COMMA struct_declarator
struct_declarator : IDENTIFIER
| IDENTIFIER array_specifier
initializer : assignment_expression
| LEFT_BRACE initializer_list RIGHT_BRACE
| LEFT_BRACE initializer_list COMMA RIGHT_BRACE
initializer_list : initializer
| initializer_list COMMA initializer
declaration_statement : declaration
statement : compound_statement
| simple_statement
simple_statement : declaration_statement
| expression_statement
| selection_statement
| switch_statement
| case_label
| iteration_statement
| jump_statement
compound_statement : LEFT_BRACE RIGHT_BRACE
| LEFT_BRACE statement_list RIGHT_BRACE
statement_no_new_scope : compound_statement_no_new_scope
| simple_statement
compound_statement_no_new_scope : LEFT_BRACE RIGHT_BRACE
| LEFT_BRACE statement_list RIGHT_BRACE
statement_list : statement
| statement_list statement
expression_statement : SEMICOLON
| expression SEMICOLON
selection_statement : IF LEFT_PAREN expression RIGHT_PAREN selection_rest_statement
selection_rest_statement : statement ELSE statement
| statement
condition : expression
| fully_specified_type IDENTIFIER EQUAL initializer
switch_statement : SWITCH LEFT_PAREN expression RIGHT_PAREN LEFT_BRACE switch_statement_list
| RIGHT_BRACE
switch_statement_list : /* nothing */
| statement_list
case_label : CASE expression COLON
| DEFAULT COLON
iteration_statement : WHILE LEFT_PAREN condition RIGHT_PAREN statement_no_new_scope
| DO statement WHILE LEFT_PAREN expression RIGHT_PAREN SEMICOLON
| FOR LEFT_PAREN for_init_statement for_rest_statement RIGHT_PAREN statement_no_new_scope
for_init_statement : expression_statement
| declaration_statement
conditionopt : condition
| /* empty */
for_rest_statement : conditionopt SEMICOLON
| conditionopt SEMICOLON expression
jump_statement : CONTINUE SEMICOLON
| BREAK SEMICOLON
| RETURN SEMICOLON
| RETURN expression SEMICOLON
| DISCARD SEMICOLON // Fragment shader only
translation_unit : external_declaration
| translation_unit external_declaration
external_declaration : function_definition
| declaration
| SEMICOLON
function_definition : function_prototype compound_statement_no_new_scope
Now, you tell me, which one is easier to understand and work with (improving, tweaking, ...)?
You can also see a ST side by side comparison here
But I'm worried that it's not clear enough that each line is a different option.
@erezsh : Well, my background is a lot of years coding on c/c++ and when I started using python many years ago I thought I wouldn't survive without braces... nowadays is the other way around, each time i see redundant tokens like braces I become sad ;)
@whitten : Just for the record, I think that particular syntax is the worst from the whole set of proposals (worst=less elegant and less cleaner)
NS: I haven't checked correctness (just a fast adaptation), so not sure if you'd transform this:
conditionopt :
condition
/* empty */
like:
conditionopt : condition
| /* empty */
nor this:
switch_statement_list :
/* nothing */
statement_list
like:
switch_statement_list : /* nothing */
| statement_list
guess that's just wrong syntax...
Ideally, you'd be able to express the same rule on different ways, for instance:
flow_stmt1: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
flow_stmt2:
break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
flow_stmt3:
break_stmt
continue_stmt
return_stmt
raise_stmt
yield_stmt
flow_stmt1==flow_stmt2==flow_stmt3
.
The idea here is you're giving the user the freedom to create hacky small compact grammars if he wants to do so or... more clean ones (even if a little bit more verbose one)
Yep, that's what I was planning.
I suggest one more way how to write multiple right-sides of the rule.
foo: bar1 | bar2
foo: bar1
foo: bar2
The main reason for such format (currently, it is rejected because of the duplicity of the left-side) is that you can easilly add comments with explanation for every line. I already have a (naive) preprocessor for that.
@marxsk Line comments are already possible
>>> from lark import Lark
>>> p=Lark("""
... // Comment
... start: "a"
... // Another comment
... | "b"
... // And C
... | "c"
... """)
>>>
Could the lark grammar be reimplemented in Lark?
@gideongrinberg It kinda is. We have a mirror that should match exactly what the actual parser accepts: lark.lark, but that is not what is being used internally. That is still being parsed with the lalr parser, but the rules are encoded here
I just had a crazy thought. What if Lark accepted a grammar_grammar
argument, which would describe the syntax for the grammar? (it should work as long as the structure is the same as lark.lark
)
@erezsh That actually sounds like a good idea. I would suggest that the argument takes a Lark
instance (e.g. Something that has a .parse(str) -> Tree
method). This would allow the grammar to use a Transformer to fix things that don't exactly match between EBNF syntax. This system would allow us to easily use EBNF grammars in a different syntax. (The current built-in grammar would have to stay ofcourse)
@MegaIng It would be interesting to try! Though seems like it's just a single function, parse(str) -> Tree
.
So passing grammar_parser = Lark.open('lark.lark', ...).parse
will have no effect, other than extra processing.
It will require a bit of work, because currently there is a deviation of structure between lark.lark
and the native grammar loader.
I just had a crazy thought. What if Lark accepted a
grammar_grammar
argument, which would describe the syntax for the grammar? (it should work as long as the structure is the same aslark.lark
)
We need standards. That's probably a bad idea in case I wanted to read other's code
BUT it could be a good idea because it allows people who don't know Lark's grammar, but instead something like ANTLR, to be able to write a grammar. We could have a libraries of possible default grammar-grammar-grammars.
@ThatXliner Well, if they produce a Tree that corresponds to lark.lark, I imagine the reconstructor should be able to automatically generate a working lark grammar. In theory, at least.
But I agree that it might become confusing, if suddently everyone used their own syntax.
I'm concerned that the nonterminal : | terminal | nonterminal ;
Since "[someterminal]" means that the someterminal is optional, perhaps if we use the syntax [] to mean an optional "empty".
I also think that at times it is easier to write the grammar with rules where one of the alternatives is "empty". However, I would much prefer that common.lark declare an EMPTY terminal. It could even be managed as a special case when processing a grammar. (I am pretty sure that EMPTY: ""
won't work.
@brupelo Just my 2 cents about the format of the alternatives that I haven't seen mentioned explicitly.
rule: SOME_TERMINAL some_rule some_other_rule ANOTHER_TERMINAL
| TERMINAL3
Is actually implying parenthesis by using the line break. The above is actually.
rule: (SOME_TERMINAL some_rule some_other_rule ANOTHER_TERMINAL) | TERMINAL3
I can understand the thinking that following the rule declaration having the rest of the line empty means that the definition of the rule is an indented block. This is very YAML-like.
Note: I really don't like the ANTLR : | ;
notation and I used ANTLER before it was converted to JAVA.
Note: about notation, regardless of the multiline format used to define a rule I think it is imperative that the colon always be used to indicate the preceding string is either a rule or terminal.
If it is true that currently having nothing on the line after a rule declaration is an error then allowing the
grammar = '''
rule :
rule1
rule2
'''
format could be okay. I think that being explicit about what the alternatives are
rule:
| rule1
| rule2
matches the YAML list construct where list item starts with "- " and in this case a rule alternative would start with '| ". To me that is a pretty clear meaning where each alternative is listed on a single line (which implies parenthesis) and has a preceeding indent followed by the alternative tag "| ". It would be one more extension to EBNF that Lark adds.
If the above was added along with a predefined EMPTY terminal and documentation reflected the use of EMPTY when that is what you mean then having nothing after the colon on the line shouldn't be confusing.
Now if Lark supported inline rule definitions (even more like YAML) I would not be happy.
// this would be very bad :-)
rule:
| rule2: TERMINAL1 TERMINAL2
| rule3: TERMINAL3 rule2