tree-sitter-cpp icon indicating copy to clipboard operation
tree-sitter-cpp copied to clipboard

Abritrary expressions on the LHS of assignments

Open elbeno opened this issue 1 year ago • 2 comments

The grammar currently seems limited in how it deals with assignment expressions and in particular what can appear on the LHS of an assignment expression. Arbitrary expressions on the LHS of assignments are a common technique in building DSLs like e.g. Boost.SML.

To take a simple example, a UDL on the LHS:

void f() { "hello"_s = 42; }

(The UDL is effectively a function call which might return e.g. a reference to a variable.) tree-sitter parse says:

(translation_unit [0, 0] - [1, 0]
  (function_definition [0, 0] - [0, 28]
    type: (primitive_type [0, 0] - [0, 4])
    declarator: (function_declarator [0, 5] - [0, 8]
      declarator: (identifier [0, 5] - [0, 6])
      parameters: (parameter_list [0, 6] - [0, 8]))
    body: (compound_statement [0, 9] - [0, 28]
      (ERROR [0, 11] - [0, 22]
        (user_defined_literal [0, 11] - [0, 20]
          (string_literal [0, 11] - [0, 18]
            (string_content [0, 12] - [0, 17]))
          (literal_suffix [0, 18] - [0, 20])))
      (expression_statement [0, 23] - [0, 26]
        (number_literal [0, 23] - [0, 25])))))

Another example:

void f() { x + y = 5; }

(If this looks odd, consider a DSL with overloaded operators.) This produces a parse tree without error:

(translation_unit [0, 0] - [1, 0]
  (function_definition [0, 0] - [0, 23]
    type: (primitive_type [0, 0] - [0, 4])
    declarator: (function_declarator [0, 5] - [0, 8]
      declarator: (identifier [0, 5] - [0, 6])
      parameters: (parameter_list [0, 6] - [0, 8]))
    body: (compound_statement [0, 9] - [0, 23]
      (expression_statement [0, 11] - [0, 21]
        (binary_expression [0, 11] - [0, 20]
          left: (identifier [0, 11] - [0, 12])
          right: (assignment_expression [0, 15] - [0, 20]
            left: (identifier [0, 15] - [0, 16])
            right: (number_literal [0, 19] - [0, 20])))))))

But the tree is incorrect with operator= binding more tightly than the operator+: it's as if the code read x + (y = 5). Ironically that expression produces an error:

void f() { x + (y = 5); }
(translation_unit [0, 0] - [1, 0]
  (function_definition [0, 0] - [0, 25]
    type: (primitive_type [0, 0] - [0, 4])
    declarator: (function_declarator [0, 5] - [0, 8]
      declarator: (identifier [0, 5] - [0, 6])
      parameters: (parameter_list [0, 6] - [0, 8]))
    body: (compound_statement [0, 9] - [0, 25]
      (expression_statement [0, 11] - [0, 23]
        (binary_expression [0, 11] - [0, 22]
          left: (identifier [0, 11] - [0, 12])
          right: (parenthesized_expression [0, 15] - [0, 22]
            (ERROR [0, 16] - [0, 19]
              (identifier [0, 16] - [0, 17]))
            (number_literal [0, 20] - [0, 21])))))))

I'm not sure what should be done about this in general: perhaps some surgery on the LHS of assignment? I see that parenthesized expressions are allowed there by the current grammar, so this parses:

void f() { ("hello"_s) = 42; }
(translation_unit [0, 0] - [1, 0]
  (function_definition [0, 0] - [0, 30]
    type: (primitive_type [0, 0] - [0, 4])
    declarator: (function_declarator [0, 5] - [0, 8]
      declarator: (identifier [0, 5] - [0, 6])
      parameters: (parameter_list [0, 6] - [0, 8]))
    body: (compound_statement [0, 9] - [0, 30]
      (expression_statement [0, 11] - [0, 28]
        (assignment_expression [0, 11] - [0, 27]
          left: (parenthesized_expression [0, 11] - [0, 22]
            (user_defined_literal [0, 12] - [0, 21]
              (string_literal [0, 12] - [0, 19]
                (string_content [0, 13] - [0, 18]))
              (literal_suffix [0, 19] - [0, 21])))
          right: (number_literal [0, 25] - [0, 27]))))))

elbeno avatar Aug 06 '23 03:08 elbeno

the first case should be fixed, I'll agree. About binary expressions - that's just asking for a mess of conflicts. So, would just fixing the first case be good enough?

amaanq avatar Aug 11 '23 00:08 amaanq

Long term, IMO no. But it's a start.

elbeno avatar Aug 11 '23 02:08 elbeno