tree-sitter-kotlin
tree-sitter-kotlin copied to clipboard
Add notes on contributing, especially w.r.t the parser verification check
New PRs regularly fail the CI check that verifies that the generated parser matches the committed parser.c, presumably because their development machine didn't generate a byte-for-byte-exact copy of the parser as the GitHub Actions runner.
Perhaps we could add some notes to clarify the motivation behind this check and how to successfully generate a conforming parser.c. Something along the lines of:
- The check exists since it is infeasible to review a ~20 MB C source file. Suggestions for something less brittle than a byte-for-byte check are of course welcome, however.
- Unfortunately, the output of
tree-sitter generateonly seems to be deterministic when run on the same platform. Since the GitHub-hosted runners use x86_64 Linux, contributors have to generate the parser from that platform too (even aarch64 Linux/macOS will produce a different output). - One option besides running Linux natively or e.g. through a Docker container would be to use the
regenerate-parser.ymlworkflow in their own fork. This would potentially require customizing the branch it runs on.
Do we really need parser.c in the repo. The tree-sitter-swift or tree-sitter-java don't have it in the repo either.
https://github.com/alex-pinkus/tree-sitter-swift#frequently-asked-questions https://github.com/tree-sitter/tree-sitter-java
tree-sitter-java does in fact check it in (see here), but I do like the approach of tree-sitter-swift, that is, omitting the parser and publishing it as a CI artifact. That would probably also help with repo growth, which may become problematic due to the frequent changes in parser.c.
The only reason I am a bit hesitant is that it seems to deviate from the dominant convention of including the parser, even if it's comparatively large. See e.g.
Might be worth keeping an eye on
- https://github.com/tree-sitter/tree-sitter/issues/930
- https://github.com/tree-sitter/tree-sitter/issues/730
which propose to remove the generated files from the repos.
...but also include it as a release artefact for semantic versioned(!) releases. (This parser takes a comparatively long time to generate, so just dropping it would not be preferred albeit workable. A separate "release branch", though, is highly discouraged -- some parsers chose to do that in the early days, but it's more headache for everyone than it's worth.)
In either case, please keep committing the generated json, as these are enough to generate the parser without node. (Not that your parser requires npm, as far as I can tell, but it's the principle.)
Yes, to be clear, I don't propose to remove anything for now, just to monitor what upstream does. When the official recommendation changes, we can revisit this.
Sure; I was just explaining current upstream policy ;)