difftastic Add support for Nim (with tests)

Nim (formerly nimrod) is a compiled systems language with type inference, macros, and memory safety. It's becoming more common. Nitter for example uses it.

This patch adds tree-sitter support from https://github.com/alaviss/tree-sitter-nim , which uses the MPL-2 license. If this isn't acceptable, I can try to find another implementation, e.g. https://github.com/alaviss/tree-sitter-nim/issues/11 mentions https://github.com/aMOPel/tree-sitter-nim.

Jan 31 '24 23:01 gcr

Thanks for the PR! I'm afraid I can't accept this in the current state, the parser is just too big (parser.c is 66MiB). Difftastic already has problems with the git repo being too big, and this parser is bigger than the largest parsers currently included.

I'd like to support Nim, but I need a smaller file. Say something smaller than 30MiB.

Feb 05 '24 16:02 Wilfred

@Wilfred, if you're open to enabling the wasm feature for Tree-sitter (which adds a dependency on wasmtime), you could consider switching away from vendoring all of the Tree-sitter grammars, and instead allow users to add their own parsers at runtime via WASM files. With the wasm feature, the native Tree-sitter library can load Language objects from wasm files, but perform native parsing with the same Rust API as normal (only the lexing phase uses WASM, so performance is not impacted very much, and you're still free to Send the resulting syntax trees to other threads as normal).

You could probably make it seamless for users by bundling a list of known grammars (with file extensions and such) and just store URLs where the corresponding WASM files can be downloaded from.

It's a very new Tree-sitter feature, developed for the Zed editor's new extension system, but it works pretty well, and I think it might be well-suited for your use case, and solve the problem of needing to bundle a large set of languages.

I know this is off-topic; I just thought I'd mention it here, since this PR was linked from a HN thread.

Mar 21 '24 00:03 maxbrunsfeld

@maxbrunsfeld ooh, I am very interested in this! The difft binary is pretty large, and having a nice way to distribute parsers separately would really help. Some distro packagers have expressed a preference for not vendoring parsers too.

I need highlights.scm too though: difftastic needs to know which nodes are strings/comments, and tree-sitter complains if you load a highlighting file that doesn't match the loaded parser. How does Zed handle this?

(I imagine Zed also needs to associate file extensions with languages, just like difftastic, so maybe you have a solution for that metadata too?)

Mar 22 '24 22:03 Wilfred

Yeah, in Zed, extensions are specified via a combination of:

a .wasm file for the Tree-sitter parser
a set of .scm files containing queries for highlighting, language injection, outline symbols, etc
a TOML file with metadata about the language (user-facing name, file extensions, other editor configuration)

I'm guessing Difftastic would want a slightly different packaging format, because you don't need all of the stuff Zed uses, but I think a similar approach would probably work.

For now, these WASM files would need to be hosted somewhere. The WASM mode of compiling parsers isn't widely used yet, but down the road, I'd love to start standardizing on ways that Tree-sitter grammars store the WASM builds and queries. Maybe just GitHub release assets.

Mar 23 '24 00:03 maxbrunsfeld