tree-sitter-kotlin
tree-sitter-kotlin copied to clipboard
Add notes on contributing, especially w.r.t the parser verification check
New PRs regularly fail the CI check that verifies that the generated parser matches the committed parser.c
, presumably because their development machine didn't generate a byte-for-byte-exact copy of the parser as the GitHub Actions runner.
Perhaps we could add some notes to clarify the motivation behind this check and how to successfully generate a conforming parser.c
. Something along the lines of:
- The check exists since it is infeasible to review a ~20 MB C source file. Suggestions for something less brittle than a byte-for-byte check are of course welcome, however.
- Unfortunately, the output of
tree-sitter generate
only seems to be deterministic when run on the same platform. Since the GitHub-hosted runners use x86_64 Linux, contributors have to generate the parser from that platform too (even aarch64 Linux/macOS will produce a different output). - One option besides running Linux natively or e.g. through a Docker container would be to use the
regenerate-parser.yml
workflow in their own fork. This would potentially require customizing the branch it runs on.
Do we really need parser.c
in the repo. The tree-sitter-swift or tree-sitter-java don't have it in the repo either.
https://github.com/alex-pinkus/tree-sitter-swift#frequently-asked-questions https://github.com/tree-sitter/tree-sitter-java
tree-sitter-java
does in fact check it in (see here), but I do like the approach of tree-sitter-swift
, that is, omitting the parser and publishing it as a CI artifact. That would probably also help with repo growth, which may become problematic due to the frequent changes in parser.c
.
The only reason I am a bit hesitant is that it seems to deviate from the dominant convention of including the parser, even if it's comparatively large. See e.g.