Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript)
Is your feature request related to a problem? Please describe.
Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.
Describe the solution you'd like
I propose adding a feature to diffsitter that ignores changes in variable/function names within minified JavaScript code. This improvement would drastically reduce the noise in diffs of minimized source builds, allowing for a clearer focus on the actual code changes rather than the fluctuation of variable names.
Describe alternatives you've considered
As workarounds, I've experimented with various git diff modes like patience, histogram, and minimal to somewhat reduce the diff size. For instance, changing the diff algorithm can alter the number of lines in the diff output significantly:
⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
116000
⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
35826
Nonetheless, these approaches still capture variable name changes, which can introduce a substantial amount of 'noise', especially in larger files.
Other potential solutions include pre-processing the files to normalize variable/function names or post-processing the diff output to filter out sections where the only changes involve variable/function names.
Additional context
The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.
I suspect this might be possible already (at least to some degree) with the following; though I haven't found any good examples/docs to help explain how to use it better yet:
- https://github.com/afnanenayet/diffsitter
-
A tree-sitter based AST difftool to get meaningful semantic diffs
-
You can also filter which tree sitter nodes are considered in the diff through the config file.
- https://github.com/afnanenayet/diffsitter#node-filtering
-
You can filter the nodes that are considered in the diff by setting
include_nodesorexclude_nodesin the config file.exclude_nodesalways takes precedence overinclude_nodes, and the type of a node is the kind of a tree-sitter node.This feature currently only applies to leaf nodes, but we could exclude nodes recursively if there's demand for it.
-
-
I'm going to hopefully play around with it a bit more now, but wanted to capture this while it was fresh in my mind.
See Also
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579
- https://github.com/afnanenayet/diffsitter/issues/149#issuecomment-1916248479
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579
- https://github.com/ast-grep/ast-grep/issues/901
- https://github.com/Wilfred/difftastic/issues/631
So this works well with an idea I had before - allow users to supply tree-sitter queries to filter which nodes can be diffed on. That is general enough that you could filter for/against certain node types and ignore variable names, for example