tree-sitter
tree-sitter copied to clipboard
ts_node_field_name_for_child returns incorrect results if child_index != structural_child_index
Field names for a child node are not being returned correctly by ts_node_field_name_for_child - field names should be indexed by structural_child_index, but they're being indexed by child_index instead. This causes fields to be associated with the wrong nodes in some cases.
Repro case:
#include <stdio.h>
#include <string.h>
#include "tree_sitter/api.h"
extern "C" {
extern const TSLanguage* tree_sitter_cpp();
}
int main(int argc, char** argv) {
const char* repro = "struct Foo { static const int bar = 10; }";
auto parser = ts_parser_new();
auto lang = tree_sitter_cpp();
ts_parser_set_language(parser, lang);
auto tree = ts_parser_parse_string(parser, NULL, repro, (int)strlen(repro));
auto node_root = ts_tree_root_node(tree);
auto node_struct = ts_node_child(node_root, 0);
auto node_body = ts_node_child(node_struct, 2);
auto node_decl = ts_node_child(node_body, 1);
printf("Iterating over field declarations using ts_node_field_name_for_child:\n");
for (int i = 0; i < (int)ts_node_child_count(node_decl); i++) {
auto child = ts_node_child(node_decl, i);
auto field = ts_node_field_name_for_child(node_decl, i);
printf("c%02d | %24s | %24s\n", i, field, ts_node_type(child));
}
printf("\n");
printf("Iterating over field declarations using ts_tree_cursor:\n");
auto cursor = ts_tree_cursor_new(node_decl);
ts_tree_cursor_goto_first_child(&cursor);
int i = 0;
do {
auto child = ts_tree_cursor_current_node(&cursor);
auto field = ts_tree_cursor_current_field_name(&cursor);
printf("c%02d | %24s | %24s\n", i, field, ts_node_type(child));
i++;
} while (ts_tree_cursor_goto_next_sibling(&cursor));
}
Output:
Iterating over field declarations using ts_node_field_name_for_child:
c00 | type | storage_class_specifier
c01 | declarator | type_qualifier
c02 | (null) | primitive_type
c03 | default_value | field_identifier
c04 | (null) | =
c05 | (null) | number_literal
c06 | (null) | ;
Iterating over field declarations using ts_tree_cursor:
c00 | (null) | storage_class_specifier
c01 | (null) | type_qualifier
c02 | type | primitive_type
c03 | declarator | field_identifier
c04 | (null) | =
c05 | default_value | number_literal
c06 | (null) | ;
Thanks for the report!
Hello,
I faced a similar issue. Is there any progress on this bug?
Here's how I reproduced it:
#include <dlfcn.h>
#include <string.h>
#include <tree_sitter/api.h>
typedef TSLanguage *(tree_sitter_lang)(void);
TSLanguage *get(const char *lang, const char *path) {
char buf[256];
snprintf(buf, sizeof(buf), "tree_sitter_%s", lang);
void *lib = dlopen(path, RTLD_NOW);
tree_sitter_lang *make_ts_language = dlsym(lib, buf);
return make_ts_language();
}
int main(int argc, char *argv[]) {
TSLanguage *ruby = get("ruby", "/Users/firas/projects/github/"
"tree-sitter-ruby/libtree-sitter-ruby.dylib");
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, ruby);
const char *program = "def mul(a, b); a * b; end";
TSTree *tree = ts_parser_parse_string(parser, NULL, program, strlen(program));
TSNode root_node = ts_tree_root_node(tree);
printf("%s\n", ts_node_string(root_node));
/*
** =>
**
** (program
** (method
** name: (identifier)
** parameters: (method_parameters (identifier) (identifier))
** (binary
** left: (identifier)
** right: (identifier))))
*/
TSNode interesting = ts_node_child(root_node, 0);
printf("%s\n", ts_node_string(interesting));
/*
** =>
**
** (method
** name: (identifier)
** parameters: (method_parameters (identifier) (identifier))
** (binary
** left: (identifier)
** right: (identifier))))
*/
for (int i = 0; i < ts_node_child_count(interesting); i++) {
printf("%s\n", ts_node_field_name_for_child(interesting, i));
}
/*
** =>
**
** (null)
** name
** (null)
** (null)
** (null)
** (null)
** (null)
*/
}
Obviously this is wrong. It needs to return name and parameters.
+1 for this, same issue
+1 for this as well. I noticed this issue is affecting emacs-29, whose built-in treesit-explore-mode displays the AST using indexes and not cursors.
There is this comment in emacs/src/treesit.c:
/* Commentary
The Emacs wrapper of tree-sitter does not expose everything the C
API provides, most notably: [...]
- It doesn't expose the tree cursor, either. Presumably, Lisp is
slow enough to make insignificant any performance advantages from
using the cursor. Not exposing the cursor also minimizes the
number of new types this adds to Emacs Lisp; currently, this adds
only the parser, node, and compiled query types.```
This suggests that working around this issue in Emacs Lisp would be problematic. I guess a work around could go to internal Emacs C code, but it feels like the issue should be really addressed in tree-sitter itself.