helix icon indicating copy to clipboard operation
helix copied to clipboard

Disabling tree-sitter on big files

Open kirawi opened this issue 4 years ago • 18 comments

Opening large files (e.g. >100mb) that have an associated syntax highlighting grammar leads to high memory usage and the file taking a while to load: https://github.com/tree-sitter/tree-sitter/issues/222

Ideally opening such a file would prompt the user with disabling syntax highlighting for the file.

kirawi avatar Jun 21 '21 20:06 kirawi

Refactor tree-sitter-highlight to work like the atom one, recomputing partial tree updates.

This is already done, the base layer is incremental, and injections are recomputed.

I personally don't think we should bother highlighting above a certain very large file size.

archseer avatar Jun 22 '21 00:06 archseer

I agree, but some people (like me) like syntax highlighting on big files even if it's not actually that useful. So yeah, I agree with you which is why I want this to be a plugin in the future. I might do it myself if I'm not too busy. For the prompt, I'm not sure how dependent it would be on tui since we're switching to termwiz.

kirawi avatar Jun 22 '21 00:06 kirawi

~I'll take this issue on (prompt to disable syntax highlighting on big files).~ Unassigning myself.

kirawi avatar Jun 22 '21 01:06 kirawi

Wouldn't we have to use syntect anyway to provide highlights for languages without treesitter at some point in the future ? Installing a plugin for syntax highlighting of some filetypes seems annoying.

sudormrfbin avatar Jun 22 '21 02:06 sudormrfbin

Maybe, syntect seems fairly light and there are a lot of common dependencies between it and Helix. However it might be a problem because it's Regex-based, and we're already trying to move away from the standard Regex library.

kirawi avatar Jun 22 '21 18:06 kirawi

@kirawi syntect uses the oniguruma regex engine by default, though it can be configured to use fancy-regex instead. If you mean regex with the "standard Regex library", why are you trying to move away from it? cargo tree shows that regex is used by 5 different crates in the dependency graph.

Aloso avatar Feb 01 '22 12:02 Aloso

I don't have any plans to drop the regex library, I also don't plan on supporting syntect. The scope of this issue should purely be on disabling highlighting on 200MB+ files.

archseer avatar Feb 01 '22 13:02 archseer

Hi, any update on this one? I opened a 21k+ lines lua file and it stack overflowed tree-sitter and crashing helix. While I try to gather more info on that particular stack overflow and report it to upstream, it would be nice to be able to disable tree-sitter on large file right within helix.

pppKin avatar Jun 22 '22 07:06 pppKin

This is a deal-breaker for me as well. As soon as I open a 820k+ LoC C file for registers, helix crashes. Change the file extension to txt and it instantly loads.

tgharib avatar Aug 12 '22 18:08 tgharib

Apart from the disable ts for files larger than xx mb thing , I think we can also implement a open as plain text command in file picker.

edit: not just file picker, search results, references, etc

pppKin avatar Aug 15 '22 12:08 pppKin

I am seeing this as well in a long .json file. Thankfully I found this issue and changing the lang to .txt fixed it.

This should be the default for large files.

msdrigg avatar Mar 01 '23 22:03 msdrigg

I still think there should be an upper bound but we can set it to something really high (200~500MB?)

archseer avatar May 18 '23 06:05 archseer

I still think there should be an upper bound but we can set it to something really high (200~500MB?)

Are you just concerned about the 500ms delay when first opening such a large file (the time until the parser times out) or what usecase did you have in mind?

Atlwadt for 200MB headers in the kernel I have seen TS perform pretty decently (and I think I it can be optimized further so.it endsup pretty usable) so the limit should be pretty large (like 500MB) IMO

pascalkuthe avatar May 18 '23 10:05 pascalkuthe

I think past a certain size size limit there's just no point in wasting CPU cycles to even attempt highlighting, e.g. 500mb+

archseer avatar May 18 '23 13:05 archseer

Makes sense adding a limit should be really straightforward now. Just need to return an error from the parser function if the filesize is larger than 500MB.

pascalkuthe avatar May 18 '23 13:05 pascalkuthe

I am not even able to open huge files, because it crashes on the file picker code-preview beforehand.

iocron avatar Jun 04 '23 04:06 iocron

Is there a way to support partial syntax highlighting not based on TS? Possibly using something like syntect or similar?

Alternatively, I also think the ability to force treesitter parsing to run through a command like :ts-force-parse might be useful since we have default timeouts of 500ms and possibly file size based limits too in the future.

MasterAwesome avatar Jul 03 '24 06:07 MasterAwesome

We don't want to support any highlighter except tree-sitter. It's possible that we may support LSP highlighting in the future, but regex highlighting would have to be implemented as a plugin.

kirawi avatar Jul 03 '24 22:07 kirawi