tree-sitter-typescript
tree-sitter-typescript copied to clipboard
bug: JSX captures whitespaces in nested, multiline tags
Did you check existing issues?
- [X] I have read all the tree-sitter docs if it relates to using the parser
- [X] I have searched the existing issues of tree-sitter-typescript
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)
No response
Describe the bug
For a given TSX template,
a["b"] = <C d="e">
<F></F>
{ g() }
</C>;
nested jsx_opening_element on a different line is captured with all whitespaces, as \n <F> instead of just <F>.
Steps To Reproduce/Bad Parse Tree
The Parse Tree is correct in both cases, but tree elements' ranges are not. I have not found a way to include ranges inside the node-based tests with *.txt files, so I've created a Rust test draft:
#[cfg(test)]
mod tests_f_node {
use tree_sitter::Node;
use super::*;
#[test]
fn tsx_tag_parse_ranges() {
let code = r#"
a["b"] = <C d="e">
<F></F>
{ g() }
</C>;
"#;
let mut parser = tree_sitter::Parser::new();
parser
.set_language(&super::language_tsx())
.expect("Error loading TypeScript TSX grammar");
let tree = parser.parse(code, None).unwrap();
let root_node = tree.root_node();
let f_node = get_f_node(root_node, code).expect("<F> node not found");
// Assert the ranges. Modify these values according to the actual positions in your code.
let start_byte = f_node.start_byte();
let end_byte = f_node.end_byte();
assert_eq!(start_byte, 36); // Replace with the correct start byte
assert_eq!(end_byte, 39); // Replace with the correct end byte
let start_position = f_node.start_position();
let end_position = f_node.end_position();
assert_eq!(start_position.row, 2); // Line number containing <F>
assert_eq!(start_position.column, 16); // Column where <F> starts
assert_eq!(end_position.row, 2);
assert_eq!(end_position.column, 19); // Column where <F> ends
}
fn get_f_node<'a>(node: Node<'a>, code: &'a str) -> Option<Node<'a>> {
for child in node.children(&mut node.walk()) {
if child.kind() == "jsx_opening_element"
&& dbg!(child.utf8_text(code.as_bytes()).unwrap()) == "<F>"
{
return Some(child);
}
if let Some(found) = get_f_node(child, code) {
return Some(found);
}
}
None
}
}
which outputs
---- tests_f_node::tsx_tag_parse_ranges stdout ----
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "<C d=\"e\">"
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "\n <F>"
thread 'tests_f_node::tsx_tag_parse_ranges' panicked at bindings/rust/lib.rs:97:50:
<F> node not found
stack backtrace:
on current master.
Expected Behavior/Parse Tree
I've bisected that to
37ced086ad8bb4fa67e8c53711e9f30e869bb78f is the first bad commit
commit 37ced086ad8bb4fa67e8c53711e9f30e869bb78f (HEAD)
Author: Amaan Qureshi <[email protected]>
Date: Fri Jul 5 23:13:15 2024 -0400
chore: generate
tsx/src/grammar.json | 370 +-
tsx/src/node-types.json | 843 +-
tsx/src/parser.c | 552504 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------------------------------------------------
typescript/src/grammar.json | 366 +-
typescript/src/node-types.json | 847 +-
typescript/src/parser.c | 530546 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------
6 files changed, 440659 insertions(+), 644817 deletions(-)
and before this commit everything works fine:
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "<C d=\"e\">"
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "<F>"
thread 'tests_f_node::tsx_tag_parse_ranges' panicked at bindings/rust/lib.rs:103:9:
assertion `left == right` failed
// this failures is a cause of my test being a draft, but it's already exposing the issue hence useful in the current state
Repro
See the test above
Hello, I'm interested to fix this and would love to get any pointers for that.
I was able to resolve the issue by rerunning npm run build in my PC.