tree-sitter-cpp
tree-sitter-cpp copied to clipboard
Abritrary expressions on the LHS of assignments
The grammar currently seems limited in how it deals with assignment expressions and in particular what can appear on the LHS of an assignment expression. Arbitrary expressions on the LHS of assignments are a common technique in building DSLs like e.g. Boost.SML.
To take a simple example, a UDL on the LHS:
void f() { "hello"_s = 42; }
(The UDL is effectively a function call which might return e.g. a reference to a variable.) tree-sitter parse
says:
(translation_unit [0, 0] - [1, 0]
(function_definition [0, 0] - [0, 28]
type: (primitive_type [0, 0] - [0, 4])
declarator: (function_declarator [0, 5] - [0, 8]
declarator: (identifier [0, 5] - [0, 6])
parameters: (parameter_list [0, 6] - [0, 8]))
body: (compound_statement [0, 9] - [0, 28]
(ERROR [0, 11] - [0, 22]
(user_defined_literal [0, 11] - [0, 20]
(string_literal [0, 11] - [0, 18]
(string_content [0, 12] - [0, 17]))
(literal_suffix [0, 18] - [0, 20])))
(expression_statement [0, 23] - [0, 26]
(number_literal [0, 23] - [0, 25])))))
Another example:
void f() { x + y = 5; }
(If this looks odd, consider a DSL with overloaded operators.) This produces a parse tree without error:
(translation_unit [0, 0] - [1, 0]
(function_definition [0, 0] - [0, 23]
type: (primitive_type [0, 0] - [0, 4])
declarator: (function_declarator [0, 5] - [0, 8]
declarator: (identifier [0, 5] - [0, 6])
parameters: (parameter_list [0, 6] - [0, 8]))
body: (compound_statement [0, 9] - [0, 23]
(expression_statement [0, 11] - [0, 21]
(binary_expression [0, 11] - [0, 20]
left: (identifier [0, 11] - [0, 12])
right: (assignment_expression [0, 15] - [0, 20]
left: (identifier [0, 15] - [0, 16])
right: (number_literal [0, 19] - [0, 20])))))))
But the tree is incorrect with operator=
binding more tightly than the operator+
: it's as if the code read x + (y = 5)
. Ironically that expression produces an error:
void f() { x + (y = 5); }
(translation_unit [0, 0] - [1, 0]
(function_definition [0, 0] - [0, 25]
type: (primitive_type [0, 0] - [0, 4])
declarator: (function_declarator [0, 5] - [0, 8]
declarator: (identifier [0, 5] - [0, 6])
parameters: (parameter_list [0, 6] - [0, 8]))
body: (compound_statement [0, 9] - [0, 25]
(expression_statement [0, 11] - [0, 23]
(binary_expression [0, 11] - [0, 22]
left: (identifier [0, 11] - [0, 12])
right: (parenthesized_expression [0, 15] - [0, 22]
(ERROR [0, 16] - [0, 19]
(identifier [0, 16] - [0, 17]))
(number_literal [0, 20] - [0, 21])))))))
I'm not sure what should be done about this in general: perhaps some surgery on the LHS of assignment? I see that parenthesized expressions are allowed there by the current grammar, so this parses:
void f() { ("hello"_s) = 42; }
(translation_unit [0, 0] - [1, 0]
(function_definition [0, 0] - [0, 30]
type: (primitive_type [0, 0] - [0, 4])
declarator: (function_declarator [0, 5] - [0, 8]
declarator: (identifier [0, 5] - [0, 6])
parameters: (parameter_list [0, 6] - [0, 8]))
body: (compound_statement [0, 9] - [0, 30]
(expression_statement [0, 11] - [0, 28]
(assignment_expression [0, 11] - [0, 27]
left: (parenthesized_expression [0, 11] - [0, 22]
(user_defined_literal [0, 12] - [0, 21]
(string_literal [0, 12] - [0, 19]
(string_content [0, 13] - [0, 18]))
(literal_suffix [0, 19] - [0, 21])))
right: (number_literal [0, 25] - [0, 27]))))))
the first case should be fixed, I'll agree. About binary expressions - that's just asking for a mess of conflicts. So, would just fixing the first case be good enough?
Long term, IMO no. But it's a start.