vim: Add HTML tag support for #4503
a simple code for html tag support, I've only done the basics, and if it's okay, I'll optimize and organize the code, and adapt other parts like is_multiline, always_expands_both_ways, target_visual_mode, etc
@weartist Thank you for taking a pass at this — it is a very commonly requested feature!
I'd love to pair with you to get this over the line, you can book time here: https://calendly.com/conradirwin/pairing. But leaving notes below in case you want to tackle this yourself:
We should not iterate over the entire file to do this. Instead we can make use of zed's tight integration with tree-sitter to traverse the syntax tree of the file instead of the file itself.
I think the right approach would be to call multi_buffer.range_for_syntax_ancestor repeatedly. At some point you'll get a range that starts with <tag and ends with </tag>. If you are doing vit you want the range you got just before that, or for vat the range itself.
We should handle <> and </> for React.
(This won't matter given the change above, but currently you're using the text of the display map; this includes inlay hints, and doesn't include folded text. Instead we should be operating exclusively at the multibuffer/buffer level and mapping back to a display point when we're done).
We'll want to add some tests for this. At the moment we don't have any example of vim tests that load HTML files; but you could copy the example of VimTest::new_typescript. I am not sure how easy it would be, but ideally some of these would use the NeovimBackedTestContext so we can verify that the behavior matches vim where we expect it do (but I would expect some divergence, for example the empty tags for React).
@weartist Thank you for taking a pass at this — it is a very commonly requested feature!
I'd love to pair with you to get this over the line, you can book time here: https://calendly.com/conradirwin/pairing. But leaving notes below in case you want to tackle this yourself:
We should not iterate over the entire file to do this. Instead we can make use of zed's tight integration with tree-sitter to traverse the syntax tree of the file instead of the file itself.
I think the right approach would be to call
multi_buffer.range_for_syntax_ancestorrepeatedly. At some point you'll get a range that starts with<tagand ends with</tag>. If you are doingvityou want the range you got just before that, or forvatthe range itself.We should handle
<>and</>for React.(This won't matter given the change above, but currently you're using the text of the display map; this includes inlay hints, and doesn't include folded text. Instead we should be operating exclusively at the multibuffer/buffer level and mapping back to a display point when we're done).
We'll want to add some tests for this. At the moment we don't have any example of vim tests that load HTML files; but you could copy the example of
VimTest::new_typescript. I am not sure how easy it would be, but ideally some of these would use theNeovimBackedTestContextso we can verify that the behavior matches vim where we expect it do (but I would expect some divergence, for example the empty tags for React).
Thank you very much for your suggestion, I will use this way to optimize the current code, if the idea is correct, I will add the test code in the future, but I may not be able to pair program with you, my English is terrible, especially listening and speaking, although I would love to be able to do so, I know it will teach me a lot of new things
@ConradIrwin hello, I tested the case where the method was changed to range_for_syntax_ancestor, which is useful when looking for tag matches, and can solve the case of e.g. dat, but it may not be suitable for e.g. dit at edge case, because when matching, the previous match result is not exactly the range of a pair of tags, for example:
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<mock>normˇal title</title>
</head>
</html>
The initial situation is that we are inside the title tag, then we execute method test_range_for_syntax_ancestor, and with we call repeatedly, they will return are two results:
"<mock>mock tag</tail>\n"
"<head>\n <meta charset=\"UTF-8\" />\n <mock>mock tag</tail>\n</head>"
As you can see, the <meta charset="UTF-8" /> is ignored between calls, therefore, if we want to get the content inside the tag, we can't simply use the previous result, but this is an error in this case with an incorrect tag, and I'm not sure if we need to consider an extreme case like 'normˇal title'
@weartist ok, that makes sense - I wasn't thinking well enough when I wrote the comment. That said, once we have the range we need for vat, it seems like we can remove the opening/closing tags to get the range for vit in one of two ways:
- We can move to a more direct traversal of the syntax tree, see https://github.com/zed-industries/zed/pull/7791/files#diff-88ce80cd4cc618e4a90758b06ffe981f48b98b117309c81e1257b471135082acR367-R368 for an example. You could use this to find the enclosing range and then skip over the nodes for the open/close tags.
- Alternatively, you could skip them by iterating over the buffer characters and implementing a small parser: see movement::find_boundary and find_preceding_boundary. This might be tricky to get attribute quoting right, but a relatively simple approach is probably good enough.
@ConradIrwin Hi, I changed the code to be based on tree-sitter, but there are still two problems:
- regular expressions may affect a little bit of performance, and there are very few places where html matching is used at the moment, can consider using base string match, or setting regular expressions to lazy static, and consider current we not use html too much, I not create a separate structure for html Abstract
- I don't have a file restriction for the time being, if the current approach is not bad, I will add test cases and add some restrictions
Thank you again for your help
@weartist This code seems ok to me, thank you!
I agree with your concern about the regexes; and I've pushed a new version to this branch that iterates over characters (in the common case where the syntax ancestor doesn't match, it will now be much quicker to fail).
I don't think we should add a file-type filtering, as I want this to work in any language that "looks right". Currently that is just TSX and HTML, but if someone installed an XML extension, this should just work.
@weartist This code(代码) seems ok to me, thank you! 这段代码对我来说似乎没问题,谢谢!
I agree with your concern(关注) about the regexes; and I've pushed a new version(版本) to this branch that iterates over characters (in the common case where the syntax ancestor(祖先) doesn't match, it will now be much quicker to fail).我同意你对正则表达式的担忧;我已经将一个新版本推送到了这个迭代字符的分支(在语法祖先不匹配的常见情况下,现在失败的速度会更快)。
I don't think we should add a file-type filtering, as I want this to work in any language that "looks right". Currently(目前) that is just TSX and HTML, but if someone installed an XML extension(扩展), this should just work.我认为我们不应该添加文件类型过滤,因为我希望它可以在任何“看起来正确”的语言中工作。目前只有 TSX 和 HTML,但如果有人安装了 XML 扩展,那么这应该可以工作。
Cool, Now it looks like it works pretty well, I tested some bugs and boundary cases and he works fine
Awesome! Thank you again