diffsitter icon indicating copy to clipboard operation
diffsitter copied to clipboard

Widen language support to all languages supported by tree-sitter

Open spartanatreyu opened this issue 3 years ago • 14 comments

Tree-sitter states that it has "fairly complete" support for 34 languages with 12 more in development.

Right now this project states its support for 11 languages. Is there a way we can increase this number?

spartanatreyu avatar Jul 18 '21 23:07 spartanatreyu

Yeah it's actually pretty easy to add new tree sitter grammars (or at least, it should be). I have them set up as submodules, and I compile the tree sitter libraries + link against them + generate bindings in the build.rs file. I'm happy to expand support myself sometime this week, but also very open to someone else taking a crack at this

afnanenayet avatar Jul 19 '21 02:07 afnanenayet

Gah, actually looks simple enough but I'm not a rust dev >_<.

I won't have time to learn a new language and learn a new set of build tools for months.

Even though git-submodules are a little niche and takes a little research and trial and error to get right, I wonder if this task could be tagged as good first issue...

spartanatreyu avatar Jul 19 '21 23:07 spartanatreyu

I'd like to see this work with JSON. I'll try to add a json grammar and see where we can go from there.

bar9 avatar Jul 20 '21 04:07 bar9

I didn't see a "JSON" parser, but I'm guessing it could be understood by either the JavaScript, TypeScript, or YAML parsers.

spartanatreyu avatar Jul 20 '21 04:07 spartanatreyu

It seems like all JSON theoretically should be valid Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON

afnanenayet avatar Jul 20 '21 06:07 afnanenayet

@bar9 The process to add a grammar:

  1. Add the tree-sitter submodule to the project
  2. Add an entry in the build script to the submodule, make sure that the name is unique amongst all the other grammars
  3. Ensure tests pass (there is a test to make sure that tree-sitter can actually load the parsers)
  4. Test by running the program on a javascript/JSON file

The function that compiles the grammar: https://github.com/afnanenayet/diffsitter/blob/2e916d01ae0b654dd5f77728c97a2db4e255d27b/build.rs#L59-L83. All the grammars I've seen have C or C++ that create libraries.

An entry for a grammar: https://github.com/afnanenayet/diffsitter/blob/2e916d01ae0b654dd5f77728c97a2db4e255d27b/build.rs#L154-L158

The other code in the build.rs file exists mostly to do some codegen to create the functions to load the parsers.

I should probably create a contributing.md file with these instructions

afnanenayet avatar Jul 20 '21 06:07 afnanenayet

It seems like all JSON theoretically should be valid Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON

Parsing JavaScript should also parse JSON, since JSON is a part of JavaScript, but I also suggested TypeScript because it is a superset of JavaScript, and YAML because it is a superset of JSON, so both should understand JSON and other things beyond JSON simultaneously.

spartanatreyu avatar Jul 21 '21 00:07 spartanatreyu

@afnanenayet thanks for the hints, I'm just trying the workflow. I guess we can also have a look at this: https://github.com/tree-sitter/tree-sitter-json. Might be faster than using a TS/JS parser, since fewer matches have to be considered. Supporting JSON should be a priority this project. If it is fast, the potential uses are endless: In redux-like stores for deep diffing complex state changes, in/near databases for history of embedded JSON docs, ...

bar9 avatar Jul 21 '21 04:07 bar9

Just a detail: YAML is no superset of JSON, but structurally equivalent. This means the resulting AST is the same, but you certainly need completely different parsers (JSON uses parentheses for hierarchy, YAML uses indentation)

bar9 avatar Jul 21 '21 04:07 bar9

I didn't know YAML and JSON are structurally equivalent, that's neat. And yeah, super happy to have any help at all! I'm also not particularly attached to the build script so if you see a better way to set this up I'm all ears. I've always thought it was a bit messy to be honest.

afnanenayet avatar Jul 21 '21 05:07 afnanenayet

So thanks again for the instructions, the build works like a charm. However I don't think the diff is doing what we would expect it to do. E.g. if i have a test1.json with the contents:

{
    "hello": "world"
}

and a test2.json with the contents: { "hello": "world2" } The output is

old.json -> new.json
====================

0:
--
+ { "hello": "world2" }

1:
--
-     "hello": "world"

This is just a diff by line. However the curly braces are just different in whitespace, they should not appear in the diff. Do I need to do something else, e.g configure parser tokens?

bar9 avatar Jul 24 '21 05:07 bar9

No you don't (or at least, shouldn't) have to do any configuration on your own beyond just adding a grammar

afnanenayet avatar Jul 24 '21 18:07 afnanenayet

Running diffsitter 0.8.1 on macOS via homebrew:

⇒ diffsitter --version
diffsitter 0.8.1

It lists support for typescript / tsx:

⇒ diffsitter list
This program was compiled with support for:
- bash
- c_sharp
- cpp
- css
- go
- hcl
- java
- json
- ocaml
- php
- python
- ruby
- rust
- tsx
- typescript

Yet by default, it will fail to run against a JavaScript file:

⇒ git difftool --tool diffsitter HEAD~1 HEAD -- unpacked/_next/static/\[buildHash\]/_buildManifest.js
Error: Unsupported file type with no fallback command specified.

Until a file-association override is added to the config (${XDG_HOME:-$HOME}/.config/diffsitter/config.json5):

// ..snip..
  "grammar": {
    "dylib-overrides": null,
    "file-associations": {
      "js": "typescript",
      "jsx": "tsx"
    },
  },
// ..snip..

This would seem like a useful thing to be included in the default config that diffsitter uses:

⇒ diffsitter dump-default-config
{
  "file-associations": null,
  "formatting": {
    "default": "unified",
    "unified": {
      "addition": {
        "highlight": null,
        "regular-foreground": "green",
        "emphasized-foreground": "green",
        "bold": true,
        "underline": false,
        "prefix": "+ "
      },
      "deletion": {
        "highlight": null,
        "regular-foreground": "red",
        "emphasized-foreground": "red",
        "bold": true,
        "underline": false,
        "prefix": "- "
      }
    },
    "json": {
      "pretty_print": false
    },
    "custom": {}
  },
  "grammar": {
    "dylib-overrides": null,
    "file-associations": null
  },
  "input-processing": {
    "split-graphemes": true,
    "exclude-kinds": null,
    "include-kinds": null
  },
  "fallback-cmd": null
}

I also noted that modifying that default config to add the file-associations to the root key didn't seem to work; it only seemed to work when I added them to the grammar version of file-associations.

0xdevalias avatar Jan 30 '24 06:01 0xdevalias