ocaml-tree-sitter-semgrep icon indicating copy to clipboard operation
ocaml-tree-sitter-semgrep copied to clipboard

Fix 'tree-sitter generate' memory explosion on Hack grammar

Open mjambon opened this issue 2 years ago • 0 comments

This is preventing us from merging and using https://github.com/returntocorp/ocaml-tree-sitter-core/pull/48.

What we know:

  • Processing the tree-sitter-hack grammar after rewriting by ocaml-tree-sitter has always consumed a lot of memory. It now requires over 16 GB, which is more than a reasonable host should have to support.
  • ocaml-tree-sitter unhides all the rules by removing the leading underscore from the rule name. Re-hiding all these rules except the entry point leads to high CPU usage which times out after 50 min (on @mjambon's old laptop).

We need to investigate tree-sitter, which is a Rust program. The first step would be to come up with a minimal test case and file a bug with the tree-sitter project. Right now, we know that the modified Hack grammar is problematic but other grammars of similar size don't show excessive memory or CPU consumption.

mjambon avatar Nov 07 '22 22:11 mjambon