Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript)
Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.
Proposed Feature
I would like to suggest the implementation of a feature in difftastic that can ignore changes in variable/function names in minified JavaScript code. This feature would be immensely helpful in reducing the noise in diffs of minimized source builds, focusing on the actual code changes rather than variable name churn.
Current Workarounds / Limitations
Currently, alternative git diff modes like patience, histogram, and minimal can be used to reduce the size of the diff somewhat. For example:
⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
116000
⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
35826
However, these methods still include variable name changes in their output, which can often lead to quite significant 'noise', particularly on larger files.
Other potential workarounds involve pre-processing the files to standardize their variable/function names or post-processing the diff output to detect and suppress chunks where the only changes are in variable/function names.
Expected Outcome
The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.
See Also
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579
- https://github.com/afnanenayet/diffsitter/issues/149#issuecomment-1916248479
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579
- https://github.com/ast-grep/ast-grep/issues/901
- https://github.com/afnanenayet/diffsitter/issues/819
- https://github.com/afnanenayet/diffsitter#node-filtering
Originally I thought that diffsitter might be a good answer to this need, but after playing with it, the diff output is fairly subpar compared to basic git diff/etc; and it's performance seemed SUPER slow on a large file (~12.29min compared to seemingly 6.746sec for difftastic (though with a lot of (8.39 MiB exceeded DFT_BYTE_LIMIT) in the output))
You can see further context/screenshots/etc of the output here:
- https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579
Edit: difftastic might not be a good fit either if I can't figure out how to get these parsing issues fixed up:
Edit: It seems when
DFT_BYTE_LIMITis exceededdifftasticfalls back to a text diff, so that's not really a fair time comparison:
- https://github.com/Wilfred/difftastic/blob/master/CHANGELOG.md#020-released-20th-february-2022
difftasticwill now use a text diff for large files that are too big to parse in a reasonable amount of time. This threshold is configurable with--byte-limitandDFT_BYTE_LIMIT.I tried overriding that in my
.gitconfig:# https://github.com/Wilfred/difftastic [difftool "difftastic"] cmd = difft --byte-limit 20971520 "$LOCAL" "$REMOTE"And then running it again, but then I just got a different set of errors:
⇒ time git difftool --tool difftastic HEAD~1 HEAD -- unpacked/_next/static/chunks/pages/_app.js | subl git difftool --tool difftastic HEAD~1 HEAD -- 12.42s user 1.10s system 79% cpu 17.043 total subl 0.01s user 0.02s system 0% cpu 17.248 total_app.js --- 1/674 --- Text (2 JavaScript parse errors, exceeded DFT_PARSE_ERROR_LIMIT)Originally posted by @0xdevalias in https://github.com/afnanenayet/diffsitter/issues/149#issuecomment-1916248479