tree-sitter-c icon indicating copy to clipboard operation
tree-sitter-c copied to clipboard

Wrong precedences for sizeof and conditional operator

Open fabian-r opened this issue 4 years ago • 1 comments

The parse trees for certain expressions involving sizeof <expr> and the ternary conditional operator are wrong. Consider the following examples: For sizeof, this...

void f(void) {
    int a;
    a = sizeof a + 1;
}

...is parsed as...

[...]
        (assignment_expression [2, 4] - [2, 20]
          left: (identifier [2, 4] - [2, 5])
          right: (sizeof_expression [2, 8] - [2, 20]
            value: (binary_expression [2, 15] - [2, 20]
              left: (identifier [2, 15] - [2, 16])
              right: (number_literal [2, 19] - [2, 20]))))))))

...whereas this would be expected:

[...]
        (assignment_expression [2, 4] - [2, 20]
          left: (identifier [2, 4] - [2, 5])
          right: (binary_expression [2, 8] - [2, 20]
            left: (sizeof_expression [2, 8] - [2, 16]
              value: (identifier [2, 15] - [2, 16]))
            right: (number_literal [2, 19] - [2, 20])))))))

And for the ternary operator, this...

void f(void) {
    int a;
    a = 0 ? 1 : 2;
}

...is parsed as...

[...]
        (conditional_expression [2, 4] - [2, 17]
          condition: (assignment_expression [2, 4] - [2, 9]
            left: (identifier [2, 4] - [2, 5])
            right: (number_literal [2, 8] - [2, 9]))
          consequence: (number_literal [2, 12] - [2, 13])
          alternative: (number_literal [2, 16] - [2, 17]))))))

...whereas this would be expected:

[...]
        (assignment_expression [2, 4] - [2, 17]
          left: (identifier [2, 4] - [2, 5])
          right: (conditional_expression [2, 8] - [2, 17]
            condition: (number_literal [2, 8] - [2, 9])
            consequence: (number_literal [2, 12] - [2, 13])
            alternative: (number_literal [2, 16] - [2, 17])))))))

I would suggest changing the precedences in grammar.js to the following:

const PREC = {
  PAREN_DECLARATOR: -10,
  ASSIGNMENT: -2,  // from -1
  CONDITIONAL: -1,  // from -2
  DEFAULT: 0,
  LOGICAL_OR: 1,
  LOGICAL_AND: 2,
  INCLUSIVE_OR: 3,
  EXCLUSIVE_OR: 4,
  BITWISE_AND: 5,
  EQUAL: 6,
  RELATIONAL: 7,
  SHIFT: 8,  // from 9
  ADD: 9,  // from 10
  MULTIPLY: 10,  // from 11
  CAST: 11,  // from 12
  SIZEOF: 12,  // from 8
  UNARY: 13,
  CALL: 14,
  FIELD: 15,
  SUBSCRIPT: 16
};

Edit: Apparently my suggested fix breaks other examples, e.g.

char f(int);

int main(void) {
	return sizeof(f)(42);
}

(which seems odd to me since the relation between the precedence of SIZEOF and CALL is not changed by the fix.)

fabian-r avatar Aug 14 '20 10:08 fabian-r

Well, according to specification sizeof must only accept unary expr and type names.

image

While in grammar sizeof uses general $._expression, but only subset of it is needed:

https://github.com/tree-sitter/tree-sitter-c/blob/5aa0bbbfc41868a3727b7a89a90e9f52e0964b2b/grammar.js#L808-L814

image

Should grammar contain $._unary_expression = choice(...) that will contain highlighted expressions? Or is it intentionally accepts all expressions, so parser can recover from error?

SnosMe avatar Mar 10 '21 22:03 SnosMe

thx @philipturnbull for the fix!

aryx avatar Oct 05 '23 07:10 aryx