diffsitter
diffsitter copied to clipboard
Widen language support to all languages supported by tree-sitter
Tree-sitter states that it has "fairly complete" support for 34 languages with 12 more in development.
Right now this project states its support for 11 languages. Is there a way we can increase this number?
Yeah it's actually pretty easy to add new tree sitter grammars (or at least, it should be). I have them set up as submodules, and I compile the tree sitter libraries + link against them + generate bindings in the build.rs file. I'm happy to expand support myself sometime this week, but also very open to someone else taking a crack at this
Gah, actually looks simple enough but I'm not a rust dev >_<.
I won't have time to learn a new language and learn a new set of build tools for months.
Even though git-submodules are a little niche and takes a little research and trial and error to get right, I wonder if this task could be tagged as good first issue
...
I'd like to see this work with JSON. I'll try to add a json grammar and see where we can go from there.
I didn't see a "JSON" parser, but I'm guessing it could be understood by either the JavaScript, TypeScript, or YAML parsers.
It seems like all JSON theoretically should be valid Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON
@bar9 The process to add a grammar:
- Add the tree-sitter submodule to the project
- Add an entry in the build script to the submodule, make sure that the name is unique amongst all the other grammars
- Ensure tests pass (there is a test to make sure that tree-sitter can actually load the parsers)
- Test by running the program on a javascript/JSON file
The function that compiles the grammar: https://github.com/afnanenayet/diffsitter/blob/2e916d01ae0b654dd5f77728c97a2db4e255d27b/build.rs#L59-L83. All the grammars I've seen have C or C++ that create libraries.
An entry for a grammar: https://github.com/afnanenayet/diffsitter/blob/2e916d01ae0b654dd5f77728c97a2db4e255d27b/build.rs#L154-L158
The other code in the build.rs
file exists mostly to do some codegen to create the functions to load the parsers.
I should probably create a contributing.md file with these instructions
It seems like all JSON theoretically should be valid Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON
Parsing JavaScript should also parse JSON, since JSON is a part of JavaScript, but I also suggested TypeScript because it is a superset of JavaScript, and YAML because it is a superset of JSON, so both should understand JSON and other things beyond JSON simultaneously.
@afnanenayet thanks for the hints, I'm just trying the workflow. I guess we can also have a look at this: https://github.com/tree-sitter/tree-sitter-json. Might be faster than using a TS/JS parser, since fewer matches have to be considered. Supporting JSON should be a priority this project. If it is fast, the potential uses are endless: In redux-like stores for deep diffing complex state changes, in/near databases for history of embedded JSON docs, ...
Just a detail: YAML is no superset of JSON, but structurally equivalent. This means the resulting AST is the same, but you certainly need completely different parsers (JSON uses parentheses for hierarchy, YAML uses indentation)
I didn't know YAML and JSON are structurally equivalent, that's neat. And yeah, super happy to have any help at all! I'm also not particularly attached to the build script so if you see a better way to set this up I'm all ears. I've always thought it was a bit messy to be honest.
So thanks again for the instructions, the build works like a charm. However I don't think the diff is doing what we would expect it to do. E.g. if i have a test1.json with the contents:
{
"hello": "world"
}
and a test2.json with the contents:
{ "hello": "world2" }
The output is
old.json -> new.json
====================
0:
--
+ { "hello": "world2" }
1:
--
- "hello": "world"
This is just a diff by line. However the curly braces are just different in whitespace, they should not appear in the diff. Do I need to do something else, e.g configure parser tokens?
No you don't (or at least, shouldn't) have to do any configuration on your own beyond just adding a grammar
Running diffsitter
0.8.1
on macOS via homebrew:
⇒ diffsitter --version
diffsitter 0.8.1
It lists support for typescript
/ tsx
:
⇒ diffsitter list
This program was compiled with support for:
- bash
- c_sharp
- cpp
- css
- go
- hcl
- java
- json
- ocaml
- php
- python
- ruby
- rust
- tsx
- typescript
Yet by default, it will fail to run against a JavaScript file:
⇒ git difftool --tool diffsitter HEAD~1 HEAD -- unpacked/_next/static/\[buildHash\]/_buildManifest.js
Error: Unsupported file type with no fallback command specified.
Until a file-association override is added to the config (${XDG_HOME:-$HOME}/.config/diffsitter/config.json5
):
// ..snip..
"grammar": {
"dylib-overrides": null,
"file-associations": {
"js": "typescript",
"jsx": "tsx"
},
},
// ..snip..
This would seem like a useful thing to be included in the default config that diffsitter uses:
⇒ diffsitter dump-default-config
{
"file-associations": null,
"formatting": {
"default": "unified",
"unified": {
"addition": {
"highlight": null,
"regular-foreground": "green",
"emphasized-foreground": "green",
"bold": true,
"underline": false,
"prefix": "+ "
},
"deletion": {
"highlight": null,
"regular-foreground": "red",
"emphasized-foreground": "red",
"bold": true,
"underline": false,
"prefix": "- "
}
},
"json": {
"pretty_print": false
},
"custom": {}
},
"grammar": {
"dylib-overrides": null,
"file-associations": null
},
"input-processing": {
"split-graphemes": true,
"exclude-kinds": null,
"include-kinds": null
},
"fallback-cmd": null
}
I also noted that modifying that default config to add the file-associations
to the root key didn't seem to work; it only seemed to work when I added them to the grammar
version of file-associations
.