tree-sitter-kotlin icon indicating copy to clipboard operation
tree-sitter-kotlin copied to clipboard

Add notes on contributing, especially w.r.t the parser verification check

Open fwcd opened this issue 2 years ago • 6 comments

New PRs regularly fail the CI check that verifies that the generated parser matches the committed parser.c, presumably because their development machine didn't generate a byte-for-byte-exact copy of the parser as the GitHub Actions runner.

Perhaps we could add some notes to clarify the motivation behind this check and how to successfully generate a conforming parser.c. Something along the lines of:

  • The check exists since it is infeasible to review a ~20 MB C source file. Suggestions for something less brittle than a byte-for-byte check are of course welcome, however.
  • Unfortunately, the output of tree-sitter generate only seems to be deterministic when run on the same platform. Since the GitHub-hosted runners use x86_64 Linux, contributors have to generate the parser from that platform too (even aarch64 Linux/macOS will produce a different output).
  • One option besides running Linux natively or e.g. through a Docker container would be to use the regenerate-parser.yml workflow in their own fork. This would potentially require customizing the branch it runs on.

fwcd avatar Jul 15 '23 16:07 fwcd

Do we really need parser.c in the repo. The tree-sitter-swift or tree-sitter-java don't have it in the repo either.

https://github.com/alex-pinkus/tree-sitter-swift#frequently-asked-questions https://github.com/tree-sitter/tree-sitter-java

ketkarameya avatar Jul 17 '23 03:07 ketkarameya

tree-sitter-java does in fact check it in (see here), but I do like the approach of tree-sitter-swift, that is, omitting the parser and publishing it as a CI artifact. That would probably also help with repo growth, which may become problematic due to the frequent changes in parser.c.

The only reason I am a bit hesitant is that it seems to deviate from the dominant convention of including the parser, even if it's comparatively large. See e.g.

fwcd avatar Jul 21 '23 00:07 fwcd

Might be worth keeping an eye on

  • https://github.com/tree-sitter/tree-sitter/issues/930
  • https://github.com/tree-sitter/tree-sitter/issues/730

which propose to remove the generated files from the repos.

fwcd avatar Jul 29 '24 15:07 fwcd

...but also include it as a release artefact for semantic versioned(!) releases. (This parser takes a comparatively long time to generate, so just dropping it would not be preferred albeit workable. A separate "release branch", though, is highly discouraged -- some parsers chose to do that in the early days, but it's more headache for everyone than it's worth.)

In either case, please keep committing the generated json, as these are enough to generate the parser without node. (Not that your parser requires npm, as far as I can tell, but it's the principle.)

clason avatar Jul 29 '24 15:07 clason

Yes, to be clear, I don't propose to remove anything for now, just to monitor what upstream does. When the official recommendation changes, we can revisit this.

fwcd avatar Aug 02 '24 01:08 fwcd

Sure; I was just explaining current upstream policy ;)

clason avatar Aug 02 '24 07:08 clason