tree-sitter-cpp icon indicating copy to clipboard operation
tree-sitter-cpp copied to clipboard

Weird parsing results with nested template argument list

Open ekilmer opened this issue 1 year ago • 4 comments

This example gives weird/inconsistent results leading me to believe there's an issue with parsing nested templates:

map<string, string> m;
map<string, vector<string>> m2;
map<string, vector<vector<string>>> v;

I expect to see all types highlighted, but the last line fails to highlight string in the first index to map; instead parsing it as an identifier when I expect type_identifier:

Screenshot 2023-02-24 at 9 42 46 AM

ekilmer avatar Feb 24 '23 14:02 ekilmer

@maxbrunsfeld - if you have a bit of time to spare to look at this, I'd really appreciate it. This is completely baffling to me:

map<string, vector<vector<string>>> v;

produces:

(translation_unit [0, 0] - [1, 0]
  (declaration [0, 0] - [0, 38]
    type: (template_type [0, 0] - [0, 35]
      name: (type_identifier [0, 0] - [0, 3])
      arguments: (template_argument_list [0, 3] - [0, 35]
        (identifier [0, 4] - [0, 10]) <-- this should be (type_descriptor (type_identifier))
        (type_descriptor [0, 12] - [0, 34]
          type: (template_type [0, 12] - [0, 34]
            name: (type_identifier [0, 12] - [0, 18])
            arguments: (template_argument_list [0, 18] - [0, 34]
              (type_descriptor [0, 19] - [0, 33]
                type: (template_type [0, 19] - [0, 33]
                  name: (type_identifier [0, 19] - [0, 25])
                  arguments: (template_argument_list [0, 25] - [0, 33]
                    (type_descriptor [0, 26] - [0, 32]
                      type: (type_identifier [0, 26] - [0, 32]))))))))))
    declarator: (identifier [0, 36] - [0, 37])))

But, putting the same exact code as a field declaration makes it work:

struct Foo {
    map<string, vector<vector<string>>> v;
};
(translation_unit [0, 0] - [3, 0]
  (struct_specifier [0, 0] - [2, 1]
    name: (type_identifier [0, 7] - [0, 10])
    body: (field_declaration_list [0, 11] - [2, 1]
      (field_declaration [1, 4] - [1, 42]
        type: (template_type [1, 4] - [1, 39]
          name: (type_identifier [1, 4] - [1, 7])
          arguments: (template_argument_list [1, 7] - [1, 39]
            (type_descriptor [1, 8] - [1, 14]  <-- now it's correct
              type: (type_identifier [1, 8] - [1, 14]))
            (type_descriptor [1, 16] - [1, 38]
              type: (template_type [1, 16] - [1, 38]
                name: (type_identifier [1, 16] - [1, 22])
                arguments: (template_argument_list [1, 22] - [1, 38]
                  (type_descriptor [1, 23] - [1, 37]
                    type: (template_type [1, 23] - [1, 37]
                      name: (type_identifier [1, 23] - [1, 29])
                      arguments: (template_argument_list [1, 29] - [1, 37]
                        (type_descriptor [1, 30] - [1, 36]
                          type: (type_identifier [1, 30] - [1, 36]))))))))))
        declarator: (field_identifier [1, 40] - [1, 41])))))

The only difference is the grandparent node of the template_argument_list (field_declaration vs declaration). Putting the code inside a function scope where it's still a normal declaration still produces the incorrect identifier.

Here's the template_type and template_argument_list rules:

template_type: $ => seq(
  field('name', $._type_identifier),
  field('arguments', $.template_argument_list)
),

template_argument_list: $ => seq(
  '<',
  commaSep(choice(
    prec.dynamic(3, $.type_descriptor),
    prec.dynamic(2, alias($.type_parameter_pack_expansion, $.parameter_pack_expansion)),
    prec.dynamic(1, $._expression)
  )),
  alias(token(prec(1, '>')), '>')
),

When the template_type is in a declaration, it's choosing $._expression (which ends up as identifier) instead of type_descriptor, which also matches and should have higher precedence for the conflict. I produced the graphviz dot debug output for both, but was having a hard time figuring out why the two cases are reducing differently so this is a bit above my ability.

jdrouhard avatar Mar 02 '23 14:03 jdrouhard

setting _expression (well now _expression_not_binary as that's the one w/ conflicts) to have a dynamic precedence of -1 fixes this, but causes issues elsewhere. maybe removing identifiers from the expressions in template argument lists is an option

amaanq avatar Jul 25 '23 12:07 amaanq

I have a similar example, although it produces a hard error, not just a misclassified node:

optional<tuple<A<W>,B<X>,C<Y>,D<Z>>> f;

This produces:

(translation_unit [0, 0] - [2, 0]
  (expression_statement [0, 0] - [0, 39]
    (parameter_pack_expansion [0, 0] - [0, 38]
      pattern: (binary_expression [0, 0] - [0, 38]
        left: (template_function [0, 0] - [0, 35]
          name: (identifier [0, 0] - [0, 8])
          arguments: (template_argument_list [0, 8] - [0, 35]
            (template_function [0, 9] - [0, 24]
              name: (identifier [0, 9] - [0, 14])
              arguments: (template_argument_list [0, 14] - [0, 24]
                (type_descriptor [0, 15] - [0, 19]
                  type: (template_type [0, 15] - [0, 19]
                    name: (type_identifier [0, 15] - [0, 16])
                    arguments: (template_argument_list [0, 16] - [0, 19]
                      (type_descriptor [0, 17] - [0, 18]
                        type: (type_identifier [0, 17] - [0, 18])))))
                (binary_expression [0, 20] - [0, 23]
                  left: (identifier [0, 20] - [0, 21])
                  right: (identifier [0, 22] - [0, 23]))))
            (type_descriptor [0, 25] - [0, 29]
              type: (template_type [0, 25] - [0, 29]
                name: (type_identifier [0, 25] - [0, 26])
                arguments: (template_argument_list [0, 26] - [0, 29]
                  (type_descriptor [0, 27] - [0, 28]
                    type: (type_identifier [0, 27] - [0, 28])))))
            (template_function [0, 30] - [0, 34]
              name: (identifier [0, 30] - [0, 31])
              arguments: (template_argument_list [0, 31] - [0, 34]
                (type_descriptor [0, 32] - [0, 33]
                  type: (type_identifier [0, 32] - [0, 33]))))))
        right: (identifier [0, 37] - [0, 38])))))
test.hpp	0 ms	(MISSING "..." [0, 38] - [0, 38])

It appears to give precedence to a parameter_pack_expansion, then cannot find the ... at the end. The level of nesting (optional<tuple<...>> not just tuple<...>) and number of template arguments (optional<tuple<A<W>,B<X>,C<Y>,D<Z>>> not just optional<tuple<A<W>,B<X>,C<Y>>> is necessary to trigger the error.

The change that @amaanq mentions above (to set the dynamic precedence of _expression to -1 in the part of the code that @jdrouhard posted) removes the hard error, although now we're back to a misclassified node:

(translation_unit [0, 0] - [2, 0]
  (declaration [0, 0] - [0, 39]
    type: (template_type [0, 0] - [0, 36]
      name: (type_identifier [0, 0] - [0, 8])
      arguments: (template_argument_list [0, 8] - [0, 36]
        (type_descriptor [0, 9] - [0, 35]
          type: (template_type [0, 9] - [0, 35]
            name: (type_identifier [0, 9] - [0, 14])
            arguments: (template_argument_list [0, 14] - [0, 35]
              (template_function [0, 15] - [0, 19]  <<<<<<<<********* template_function, should be type_descriptor?
                name: (identifier [0, 15] - [0, 16])
                arguments: (template_argument_list [0, 16] - [0, 19]
                  (type_descriptor [0, 17] - [0, 18]
                    type: (type_identifier [0, 17] - [0, 18]))))
              (type_descriptor [0, 20] - [0, 24]
                type: (template_type [0, 20] - [0, 24]
                  name: (type_identifier [0, 20] - [0, 21])
                  arguments: (template_argument_list [0, 21] - [0, 24]
                    (type_descriptor [0, 22] - [0, 23]
                      type: (type_identifier [0, 22] - [0, 23])))))
              (type_descriptor [0, 25] - [0, 29]
                type: (template_type [0, 25] - [0, 29]
                  name: (type_identifier [0, 25] - [0, 26])
                  arguments: (template_argument_list [0, 26] - [0, 29]
                    (type_descriptor [0, 27] - [0, 28]
                      type: (type_identifier [0, 27] - [0, 28])))))
              (type_descriptor [0, 30] - [0, 34]
                type: (template_type [0, 30] - [0, 34]
                  name: (type_identifier [0, 30] - [0, 31])
                  arguments: (template_argument_list [0, 31] - [0, 34]
                    (type_descriptor [0, 32] - [0, 33]
                      type: (type_identifier [0, 32] - [0, 33]))))))))))
    declarator: (identifier [0, 37] - [0, 38])))

Well, this is not an incorrect parse, that could indeed be a template_function, but it seems inconsistent to prefer template_function for the first template argument, but type_descriptor for the others.

To have a shot in the dark, I wonder whether line 155 has something to do with the precedences resolving this way, but have not found an alternative that keeps other tests working:

    type_descriptor: (_, original) => prec.right(original),

lawmurray avatar Oct 27 '23 15:10 lawmurray

A further update: setting the precedence of _expression to 0 rather than -1 produces the same output above (i.e. one of the arguments is template_function), but now tree-sitter test passes at least.

lawmurray avatar Oct 27 '23 15:10 lawmurray