tree-sitter-diff
tree-sitter-diff copied to clipboard
feat: split command filename to old/new file
This PR is mostly motivated by the following problem:
- The diff command has two filenames, but the parser is currently parsing everything after
diff --gitas a single filename node, which is wrong. - This results in different semantic interpretations during highlighting, whereas ideally it should have the exact semantic meaning. I propose it should be
diff --git (old_file) (new_file)so that these filenames get the same highlighting/semantic meaning as the ones in the diff output,--- (old_file)and+++ (new_file).
On this input (taken from ur playground)
diff --git a/.gitmodules b/.gitmodules
index d5bd61c9e..422671b4e 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -174,3 +174,7 @@
path = helix-syntax/languages/tree-sitter-git-commit
url = https://github.com/the-mikedavis/tree-sitter-git-commit.git
shallow = true
+[submodule "helix-syntax/languages/tree-sitter-git-diff"]
+ path = helix-syntax/languages/tree-sitter-git-diff
+ url = https://github.com/the-mikedavis/tree-sitter-git-diff.git
+ shallow = true
the query output is now
a.diff
pattern: 4
capture: 4 - variable.builtin, start: (0, 0), end: (0, 38), text: `diff --git a/.gitmodules b/.gitmodules`
pattern: 1
capture: 1 - keyword, start: (0, 11), end: (0, 24), text: `a/.gitmodules`
pattern: 0
capture: 0 - string, start: (0, 25), end: (0, 38), text: `b/.gitmodules`
pattern: 2
capture: 2 - constant, start: (1, 6), end: (1, 15), text: `d5bd61c9e`
pattern: 2
capture: 2 - constant, start: (1, 17), end: (1, 26), text: `422671b4e`
pattern: 1
capture: 1 - keyword, start: (2, 0), end: (2, 17), text: `--- a/.gitmodules`
pattern: 0
capture: 0 - string, start: (3, 0), end: (3, 17), text: `+++ b/.gitmodules`
pattern: 3
capture: 3 - attribute, start: (4, 0), end: (4, 19), text: `@@ -174,3 +174,7 @@`
pattern: 0
capture: 0 - string, start: (8, 0), end: (8, 58), text: `+[submodule "helix-syntax/languages/tree-sitter-git-diff"]`
pattern: 0
capture: 0 - string, start: (9, 0), end: (9, 52), text: `+ path = helix-syntax/languages/tree-sitter-git-diff`
pattern: 0
capture: 0 - string, start: (10, 0), end: (10, 65), text: `+ url = https://github.com/the-mikedavis/tree-sitter-git-diff.git`
pattern: 0
capture: 0 - string, start: (11, 0), end: (11, 16), text: `+ shallow = true`
notice that the a/.gitmodules and b/.gitmodules from the diff --git a/.gitmodules b/.gitmodules is being picked up by the query now. And they have respectively identical captures with the --- a/.gitmodules and +++ b/.gitmodules
I was interested in adding this but it's not straightforward if you have filenames with spaces in them:
diff --git a/a b.txt b/a b.txt
index 86e041d..46add00 100644
--- a/a b.txt
+++ b/a b.txt
@@ -1,3 +1,3 @@
foo
-bar
+baz
baz
On this branch:
$ tree-sitter parse f.diff
(source [0, 0] - [9, 0]
(command [0, 0] - [0, 20]
(old_file [0, 11] - [0, 14])
(new_file [0, 15] - [0, 20]))
(ERROR [0, 21] - [0, 30]
(ERROR [0, 21] - [0, 30]))
(index [1, 0] - [1, 29]
(commit [1, 6] - [1, 13])
(commit [1, 15] - [1, 22])
(mode [1, 23] - [1, 29]))
(old_file [2, 0] - [2, 7]
(filename [2, 4] - [2, 7]))
(ERROR [2, 8] - [2, 13]
(ERROR [2, 8] - [2, 13]))
(new_file [3, 0] - [3, 7]
(filename [3, 4] - [3, 7]))
(ERROR [3, 8] - [3, 13]
(ERROR [3, 8] - [3, 13]))
(location [4, 0] - [4, 15]
(linerange [4, 3] - [4, 7])
(linerange [4, 8] - [4, 12]))
(context [5, 0] - [5, 4])
(deletion [6, 0] - [6, 4])
(addition [7, 0] - [7, 4])
(context [8, 0] - [8, 4]))
f.diff 0 ms (ERROR [0, 21] - [0, 30])
I see, that's a good point. Let me think about this and revisit this when I have a solution.