nvim-yati icon indicating copy to clipboard operation
nvim-yati copied to clipboard

Erroneous Nodes, Realtime Indentation and the future of TS based indentation

Open vhyrro opened this issue 2 years ago • 3 comments

Hey! Glad to see such a project pop up - indentation through treesitter is hard, and I respect your efforts to make it better :D

I'd love to help out here n there wherever I can, getting something like this perhaps merged into nvim-treesitter would be phenomenal, as it would fix indentation issues in several languages and projects.

Here are some ideas that I believe you may want to have on your TODO list, just based on my experience of treesitter issues:

Erroneous Nodes

A serious problem with TS-based indentation tends to be incomplete nodes. For example in python doing this:

if something:
	print("True")
	else:
    print("Something else")

Would yield a completely unexpected and invalid syntax tree:

if_statement [0, 0] - [3, 27]
  condition: identifier [0, 3] - [0, 12]
  consequence: block [1, 4] - [3, 27]
    expression_statement [1, 4] - [1, 17]
      call [1, 4] - [1, 17]
        function: identifier [1, 4] - [1, 9]
        arguments: argument_list [1, 9] - [1, 17]
          string [1, 10] - [1, 16]
    expression_statement [2, 4] - [3, 27]
      assignment [2, 4] - [3, 27]
        left: identifier [2, 4] - [2, 8]
        type: type [3, 4] - [3, 27]
          call [3, 4] - [3, 27]
            function: identifier [3, 4] - [3, 9]
            arguments: argument_list [3, 9] - [3, 27]
              string [3, 10] - [3, 26]

What's worse is that there are no errors in this tree, so everything looks normal, even though it isn't. I suppose we simply have to use regex-based fallbacks in scenarios like this.

In the case of python checking for ^%s+else isn't difficult. What would be a little more difficult is figuring out when we should use regex-based fallbacks, and when we should try catching treesitter nodes. There'd also have to be a way to define these things in the configuration tables.

Realtime Indentation to Resolve Ambiguities

If this feels like it's too intrusive I suppose we could disable it by default. It'd be great to have a feature that indents when certain nodes are matched. For example in this code:

if something then
	print("Hello!")

If I were to hit enter, it'd indent my cursor to the level of the print call - this is expected. If I were to then type end however it'd make sense for the indentation engine to automatically dedent that line to the level of the if statement, like so:

if something then
	print("Hello!")
end

Scheme Configuration

Your plugin uses lua tables to determine what nodes to indent etc. This isn't a problem in of itself in my opinion! But I do believe having a translation layer from a .scm file to a lua configuration table would be perfect to make it consistent with other treesitter plugins. To make this translation layer it'd be required to parse the .scm file with e.g. the query treesitter parser and then walk through the syntax tree and convert it to config tables on startup. Despite what it may seem like this shouldn't slow down startup time much, walking through syntax trees is speedy.

Doing this would also allow captures and predicates for more fine control over the indentation engine's behaviour :)


All in all what do you think of these ideas? I could start toying around with some of these ideas if that's okay with you, indentation is something I really care about and believe neovim lacks good support of :P

vhyrro avatar Jan 09 '22 16:01 vhyrro

Thanks for your suggestions! I did think about some similar ideas during initial development, and I'd like to share my own thoughts for them here.

Regex Fallback

I did encounter the exactly same problem while dealing with python indent:

if something:
	print("True")
	else:
    print("Something else")

I definitely agree that regex-based indent is the appropriate solution for this situation, and actually that's on my initial plan for this plugin. But I hesitate to add this to the plugin, because the configuration and calculation of regex-based indent might be better to be externalized to keep this plugin TS-based IMO. In fact, user could already implement the fallback like this:

function! GetCompositeIndent()
	let s:ts_indent =  v:lua.require('nvim-yati.indent').get_indent()
	if s:ts_indent >= 0
		return s:ts_indent
	else
		return GetRegexIndent()
	endif
endfunction

set indentexpr=GetCompositeIndent()

To be honest, I prefer the regex-based indent become builtin for nvim as many editors already support it natively, like vscode: https://code.visualstudio.com/api/language-extensions/language-configuration-guide#indentation-rules. But if that's too hard or the team refused it to become part of the core, then we could consider implementing it in this plugin.

Reindent as Typing

This somehow overlaps with the functionality of 'indentkeys' and I think is already handled by vim. The example presented could be addressed by:

set indentkeys+=0=end

This is usually set in the bundled filetype plugins of vim.

Scheme Configuration

Great idea and I've never thought about it! But I'm afraid that it's too hard or impossible by design. I choose not to use scm because nvim currently captures the matched node for query files recursively but the algorithm of the plugin traverse the tree from bottom to up. This would cause big performance issue because we need to find all the matched nodes of the whole tree on every indent calculation, like what the indent module of nvim-treesitter currently does: https://github.com/nvim-treesitter/nvim-treesitter/blob/0d53066533643fac8d9a1a247bde3cf9132077ad/lua/nvim-treesitter/indent.lua#L28-L49.

I felt extreme lags while using it previously, and since the type name of tree node is sufficient for indent calculation, I decided to use Lua table to have more control of the query matching and test the node on demand. This drastically reduces the time spent for calculation and the typing is much more fluent.

Are there any use cases for extended tree-sitter query syntax? Currently I think only the node's type name is enough and executing tree-sitter query is time-costing because nvim captures the matches recursively on the tree, so sticking to Lua table would be fine.

I'm not sure whether it is clear enough, so let me know whether further explanations are needed.

yioneko avatar Jan 10 '22 03:01 yioneko

Heyo! Sorry for the late response, life got busy.

But I hesitate to add this to the plugin, because the configuration and calculation of regex-based indent might be better to be externalized to keep this plugin TS-based IMO

This is fine, but then in my humble opinion it defeats the purpose of making it "better indentation", since treesitter indentation on its own simply cannot be perfect and needs to be aided through other means. I feel like adding just a little bit more logic for that regex-based indent wouldn't hurt!

This somehow overlaps with the functionality of 'indentkeys' and I think is already handled by vim

Gotcha, makes sense. It's weird though that in my experience it tends to not work every time, but I may be doing something wrong in my config :thinking:.

Great idea and I've never thought about it! But I'm afraid that it's too hard or impossible by design. I choose not to use scm because nvim currently captures the matched node for query files recursively but the algorithm of the plugin traverse the tree from bottom to up

I see. I was thinking of simply making a translation layer though, one that would only be applied once on startup, and not one that would be directly used for calculation at runtime. It would convert the .scm file for every language and essentially generate the same tables you'd find in configs/*.lua. I thought it'd just make it a little easier to construct indent queries quickly, plus allow users to extend those queries in after/queries/lang/indents.scm. I'm not sure whether it would be overkill however.

Thanks for taking your time to respond!

vhyrro avatar Jan 13 '22 18:01 vhyrro

Recently I started experimenting with the custom hooks to handle situations difficult for treesitter in 91012b666ef7aa70b0dc88156b7f57359e1b9fe4. We can use these hooks to change the normal behavior of the core algorithm by returning a value of indent shift and the new node to traverse from. Regex-based calculation logic could be added there.

For the abnormal cases, feel free to open issues for them and provide the reproducible code snippet. Handling more edge cases will definitely make this plugin better :)

I was thinking of simply making a translation layer though, one that would only be applied once on startup, and not one that would be directly used for calculation at runtime.

If the translation keeps compatibility with config in Lua side, that would be fine. What I consider as difficult parts are the advanced predicates in query language, that might be impossible to implement and should be ignored due to the limitation of Lua config. I'm not sure whether the mentioned translation layer is meant to be fully compliant with treesitter query language or just a limited subset. And if it's the former one, please let me know the implementation idea or works need to be done in Lua side.

yioneko avatar Jan 14 '22 03:01 yioneko

For query-based approach, indent module in nvim-treesitter got back its development recently. There is no plan to support that here in the short term. BTW, a new DSL(might be lisp-like) is my ideal solution for indent.

For complex situation and error nodes, we already have support for regex fallback and custom handlers.

yioneko avatar Apr 10 '23 10:04 yioneko