helix
helix copied to clipboard
Disabling tree-sitter on big files
Opening large files (e.g. >100mb) that have an associated syntax highlighting grammar leads to high memory usage and the file taking a while to load: https://github.com/tree-sitter/tree-sitter/issues/222
Ideally opening such a file would prompt the user with disabling syntax highlighting for the file.
Refactor tree-sitter-highlight to work like the atom one, recomputing partial tree updates.
This is already done, the base layer is incremental, and injections are recomputed.
I personally don't think we should bother highlighting above a certain very large file size.
I agree, but some people (like me) like syntax highlighting on big files even if it's not actually that useful. So yeah, I agree with you which is why I want this to be a plugin in the future. I might do it myself if I'm not too busy. For the prompt, I'm not sure how dependent it would be on tui since we're switching to termwiz.
~I'll take this issue on (prompt to disable syntax highlighting on big files).~ Unassigning myself.
Wouldn't we have to use syntect anyway to provide highlights for languages without treesitter at some point in the future ? Installing a plugin for syntax highlighting of some filetypes seems annoying.
Maybe, syntect seems fairly light and there are a lot of common dependencies between it and Helix. However it might be a problem because it's Regex-based, and we're already trying to move away from the standard Regex library.
@kirawi syntect uses the oniguruma regex engine by default, though it can be configured to use fancy-regex instead. If you mean regex with the "standard Regex library", why are you trying to move away from it? cargo tree shows that regex is used by 5 different crates in the dependency graph.
I don't have any plans to drop the regex library, I also don't plan on supporting syntect. The scope of this issue should purely be on disabling highlighting on 200MB+ files.
Hi, any update on this one? I opened a 21k+ lines lua file and it stack overflowed tree-sitter and crashing helix. While I try to gather more info on that particular stack overflow and report it to upstream, it would be nice to be able to disable tree-sitter on large file right within helix.
This is a deal-breaker for me as well. As soon as I open a 820k+ LoC C file for registers, helix crashes. Change the file extension to txt and it instantly loads.
Apart from the disable ts for files larger than xx mb thing , I think we can also implement a open as plain text command in file picker.
edit: not just file picker, search results, references, etc
I am seeing this as well in a long .json file. Thankfully I found this issue and changing the lang to .txt fixed it.
This should be the default for large files.
I still think there should be an upper bound but we can set it to something really high (200~500MB?)
I still think there should be an upper bound but we can set it to something really high (200~500MB?)
Are you just concerned about the 500ms delay when first opening such a large file (the time until the parser times out) or what usecase did you have in mind?
Atlwadt for 200MB headers in the kernel I have seen TS perform pretty decently (and I think I it can be optimized further so.it endsup pretty usable) so the limit should be pretty large (like 500MB) IMO
I think past a certain size size limit there's just no point in wasting CPU cycles to even attempt highlighting, e.g. 500mb+
Makes sense adding a limit should be really straightforward now. Just need to return an error from the parser function if the filesize is larger than 500MB.
I am not even able to open huge files, because it crashes on the file picker code-preview beforehand.
Is there a way to support partial syntax highlighting not based on TS? Possibly using something like syntect or similar?
Alternatively, I also think the ability to force treesitter parsing to run through a command like :ts-force-parse might be useful since we have default timeouts of 500ms and possibly file size based limits too in the future.
We don't want to support any highlighter except tree-sitter. It's possible that we may support LSP highlighting in the future, but regex highlighting would have to be implemented as a plugin.