tree-sitter icon indicating copy to clipboard operation
tree-sitter copied to clipboard

probably UB: `QueryMatches::next` in Rust library clobbers previously returned values.

Open umanwizard opened this issue 1 year ago • 1 comments

Consider test.c:

void
do_stuff()
{
  int x = 42;
  char const *y = "Hello, world!";
  float z = 3.14159;
}

and main.rs:

use tree_sitter::{Parser, Query, QueryCursor};

fn main() {
    let mut parser = Parser::new();
    let l = tree_sitter_c::language();
    parser.set_language(l).unwrap();

    let contents = std::fs::read("/home/brennan/test.c").unwrap();
    let tree = parser.parse(&contents, None).unwrap();
    let q = Query::new(l, "(declaration type: (primitive_type) @t)").unwrap();
    let mut qc = QueryCursor::new();

    let matches: Vec<_> = qc
        .matches(&q, tree.root_node(), contents.as_slice())
        .collect();
    for m in matches {
        println!(
            "{}",
            m.captures[0].node.utf8_text(contents.as_slice()).unwrap()
        );
    }
}

This prints:

float
float
float

rather than the expected

int
char
float

The issue is that each call to QueryMatches::next constructs a QueryMatch object that points to data returned by ts_query_cursor_next_match. However, this data references buffers that become invalidated on each new call to ts_query_cursor_next_match. So it's in general not okay to keep the old QueryMatch objects alive when constructing new ones.

umanwizard avatar Jan 05 '24 16:01 umanwizard

As an aside / tangentially related issue: the C functions should be documented to clearly explain the lifetime of validity of the data they return.

umanwizard avatar Jan 05 '24 16:01 umanwizard

Closing as a duplicate of #2843.

ObserverOfTime avatar Apr 12 '24 16:04 ObserverOfTime