tree-sitter icon indicating copy to clipboard operation
tree-sitter copied to clipboard

ts_node_field_name_for_child returns incorrect results if child_index != structural_child_index

Open aappleby opened this issue 3 years ago • 3 comments

Field names for a child node are not being returned correctly by ts_node_field_name_for_child - field names should be indexed by structural_child_index, but they're being indexed by child_index instead. This causes fields to be associated with the wrong nodes in some cases.

Repro case:

#include <stdio.h>
#include <string.h>
#include "tree_sitter/api.h"

extern "C" {
  extern const TSLanguage* tree_sitter_cpp();
}

int main(int argc, char** argv) {

  const char* repro = "struct Foo { static const int bar = 10; }";

  auto parser = ts_parser_new();
  auto lang = tree_sitter_cpp();
  ts_parser_set_language(parser, lang);

  auto tree = ts_parser_parse_string(parser, NULL, repro, (int)strlen(repro));
  auto node_root = ts_tree_root_node(tree);
  auto node_struct = ts_node_child(node_root, 0);
  auto node_body = ts_node_child(node_struct, 2);
  auto node_decl = ts_node_child(node_body, 1);

  printf("Iterating over field declarations using ts_node_field_name_for_child:\n");
  for (int i = 0; i < (int)ts_node_child_count(node_decl); i++) {
    auto child = ts_node_child(node_decl, i);
    auto field = ts_node_field_name_for_child(node_decl, i);
    printf("c%02d | %24s | %24s\n", i, field, ts_node_type(child));
  }
  printf("\n");

  printf("Iterating over field declarations using ts_tree_cursor:\n");
  auto cursor = ts_tree_cursor_new(node_decl);
  ts_tree_cursor_goto_first_child(&cursor);
  int i = 0;
  do {
    auto child = ts_tree_cursor_current_node(&cursor);
    auto field = ts_tree_cursor_current_field_name(&cursor);
    printf("c%02d | %24s | %24s\n", i, field, ts_node_type(child));
    i++;
  } while (ts_tree_cursor_goto_next_sibling(&cursor));
}

Output:

Iterating over field declarations using ts_node_field_name_for_child:
c00 |                     type |  storage_class_specifier
c01 |               declarator |           type_qualifier
c02 |                   (null) |           primitive_type
c03 |            default_value |         field_identifier
c04 |                   (null) |                        =
c05 |                   (null) |           number_literal
c06 |                   (null) |                        ;

Iterating over field declarations using ts_tree_cursor:
c00 |                   (null) |  storage_class_specifier
c01 |                   (null) |           type_qualifier
c02 |                     type |           primitive_type
c03 |               declarator |         field_identifier
c04 |                   (null) |                        =
c05 |            default_value |           number_literal
c06 |                   (null) |                        ;

aappleby avatar Feb 05 '22 20:02 aappleby

Thanks for the report!

maxbrunsfeld avatar Feb 05 '22 21:02 maxbrunsfeld

Hello,

I faced a similar issue. Is there any progress on this bug?

Here's how I reproduced it:

#include <dlfcn.h>
#include <string.h>

#include <tree_sitter/api.h>

typedef TSLanguage *(tree_sitter_lang)(void);

TSLanguage *get(const char *lang, const char *path) {
  char buf[256];
  snprintf(buf, sizeof(buf), "tree_sitter_%s", lang);
  void *lib = dlopen(path, RTLD_NOW);
  tree_sitter_lang *make_ts_language = dlsym(lib, buf);
  return make_ts_language();
}

int main(int argc, char *argv[]) {
  TSLanguage *ruby = get("ruby", "/Users/firas/projects/github/"
                                 "tree-sitter-ruby/libtree-sitter-ruby.dylib");
  TSParser *parser = ts_parser_new();
  ts_parser_set_language(parser, ruby);

  const char *program = "def mul(a, b); a * b; end";

  TSTree *tree = ts_parser_parse_string(parser, NULL, program, strlen(program));
  TSNode root_node = ts_tree_root_node(tree);

  printf("%s\n", ts_node_string(root_node));
  /*
  **  =>
  **
  ** (program
  **   (method
  **     name:       (identifier)
  **     parameters: (method_parameters (identifier) (identifier))
  **     (binary
  **       left:  (identifier)
  **       right: (identifier))))
  */

  TSNode interesting = ts_node_child(root_node, 0);
  printf("%s\n", ts_node_string(interesting));
  /*
  ** =>
  **
  ** (method
  **   name:       (identifier)
  **   parameters: (method_parameters (identifier) (identifier))
  **   (binary
  **     left:  (identifier)
  **     right: (identifier))))
  */

  for (int i = 0; i < ts_node_child_count(interesting); i++) {
    printf("%s\n", ts_node_field_name_for_child(interesting, i));
  }
  /*
  ** =>
  **
  ** (null)
  ** name
  ** (null)
  ** (null)
  ** (null)
  ** (null)
  ** (null)
  */
}

Obviously this is wrong. It needs to return name and parameters.

stackmystack avatar Jun 22 '22 11:06 stackmystack

+1 for this, same issue

Enter-tainer avatar Jun 23 '22 17:06 Enter-tainer

+1 for this as well. I noticed this issue is affecting emacs-29, whose built-in treesit-explore-mode displays the AST using indexes and not cursors.

There is this comment in emacs/src/treesit.c:

/* Commentary                                                                   
                                                                                
   The Emacs wrapper of tree-sitter does not expose everything the C            
   API provides, most notably: [...]

   - It doesn't expose the tree cursor, either.  Presumably, Lisp is            
     slow enough to make insignificant any performance advantages from          
     using the cursor.  Not exposing the cursor also minimizes the              
     number of new types this adds to Emacs Lisp; currently, this adds          
     only the parser, node, and compiled query types.```

This suggests that working around this issue in Emacs Lisp would be problematic. I guess a work around could go to internal Emacs C code, but it feels like the issue should be really addressed in tree-sitter itself.

ptroja avatar Jan 02 '23 10:01 ptroja