difftastic icon indicating copy to clipboard operation
difftastic copied to clipboard

Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript)

Open 0xdevalias opened this issue 1 year ago • 6 comments

Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.

Proposed Feature

I would like to suggest the implementation of a feature in difftastic that can ignore changes in variable/function names in minified JavaScript code. This feature would be immensely helpful in reducing the noise in diffs of minimized source builds, focusing on the actual code changes rather than variable name churn.

Current Workarounds / Limitations

Currently, alternative git diff modes like patience, histogram, and minimal can be used to reduce the size of the diff somewhat. For example:

⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
  116000

⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
   35826

However, these methods still include variable name changes in their output, which can often lead to quite significant 'noise', particularly on larger files.

Other potential workarounds involve pre-processing the files to standardize their variable/function names or post-processing the diff output to detect and suppress chunks where the only changes are in variable/function names.

Expected Outcome

The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.

See Also

  • https://github.com/0xdevalias/chatgpt-source-watch/issues/3
    • https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579
      • https://github.com/afnanenayet/diffsitter/issues/149#issuecomment-1916248479
  • https://github.com/ast-grep/ast-grep/issues/901
  • https://github.com/afnanenayet/diffsitter/issues/819
  • https://github.com/afnanenayet/diffsitter#node-filtering

0xdevalias avatar Jan 30 '24 04:01 0xdevalias

Originally I thought that diffsitter might be a good answer to this need, but after playing with it, the diff output is fairly subpar compared to basic git diff/etc; and it's performance seemed SUPER slow on a large file (~12.29min compared to seemingly 6.746sec for difftastic (though with a lot of (8.39 MiB exceeded DFT_BYTE_LIMIT) in the output))

You can see further context/screenshots/etc of the output here:

  • https://github.com/0xdevalias/chatgpt-source-watch/issues/3#issuecomment-1916176579

Edit: difftastic might not be a good fit either if I can't figure out how to get these parsing issues fixed up:

Edit: It seems when DFT_BYTE_LIMIT is exceeded difftastic falls back to a text diff, so that's not really a fair time comparison:

  • https://github.com/Wilfred/difftastic/blob/master/CHANGELOG.md#020-released-20th-february-2022
    • difftastic will now use a text diff for large files that are too big to parse in a reasonable amount of time. This threshold is configurable with --byte-limit and DFT_BYTE_LIMIT.

I tried overriding that in my .gitconfig:

# https://github.com/Wilfred/difftastic
[difftool "difftastic"]
  cmd = difft --byte-limit 20971520 "$LOCAL" "$REMOTE"

And then running it again, but then I just got a different set of errors:

 ⇒ time git difftool --tool difftastic HEAD~1 HEAD -- unpacked/_next/static/chunks/pages/_app.js | subl
git difftool --tool difftastic HEAD~1 HEAD --   12.42s user 1.10s system 79% cpu 17.043 total
subl  0.01s user 0.02s system 0% cpu 17.248 total
_app.js --- 1/674 --- Text (2 JavaScript parse errors, exceeded DFT_PARSE_ERROR_LIMIT)

Originally posted by @0xdevalias in https://github.com/afnanenayet/diffsitter/issues/149#issuecomment-1916248479

0xdevalias avatar Jan 30 '24 06:01 0xdevalias