diffr icon indicating copy to clipboard operation
diffr copied to clipboard

Improve performance for diffs with very large hunks

Open mookid opened this issue 6 years ago • 12 comments

See https://scripter.co/git-diff-minified-js-and-css/

On the repository https://github.com/gohugoio/hugoDocs, git log -p -- themes/gohugoioTheme/static/dist/app.bundle.js | diffr takes ~9min on my machine.

One problematic commit: 0960c5fb36e9abcf3b492a523943268e89066543

Explore possible workarounds:

  • stop LCS when the middle snake is good enough?
  • delect long lines?

mookid avatar Jul 20 '19 00:07 mookid

I'm hitting diffs where git add -p can take hours, and it took a while to realize diffr was the cause— i had gotten used to beautiful, useful diffs and forgotten i was not on git standard.

One thing i do not understand is why it has seemingly suddenly gotten much worse for me. As in, in just the last week, went from never thinking about it (effectively instant) to being completely unusable, across multiple projects of the kind i always work with. Some underlying tool that might have gotten an update?

Anyhow strong vote for some way to detect and bail on long lines. Maybe something configurable, and to not try to diff lines longer than 500 characters by default.

mlncn avatar Sep 07 '22 20:09 mlncn

sorry about the bad experience. of course, if you have a reproduction case, I would be very happy to have a look.

An relatively easy way to proceed would be to periodically check for maximal processing time for a given hunk, and bail if necessary (and fallback to the default diff processing)

I will most likely no do the coding myself (my company is not very supportive of open source at all) but I can provide some guidance if anyone would want try to tackle it

mookid avatar Sep 07 '22 20:09 mookid

Part of a reason for the slowness might be a massive amount of generating excess escape sequences. See, e.g. this two lines change looks like this with the escape sequences:

image

Clearly, most of them are unneeded.

Hi-Angel avatar Jan 20 '23 01:01 Hi-Angel

I think that the slow cases are bad cases of the Myers algo, which is quadratic in number of tokens if the before/after are very different. I don't think that the added control chars are enough to make it slower by itself.

mookid avatar Jan 20 '23 09:01 mookid

I experience the issue with VERY large diffs. 500 lines was fine but 1k+ it becomes a problem.

Here's a script to reproduce the issue

tr -dc a-z1-4 </dev/urandom |
  tr 1-2 ' \n' |
  awk 'length==0 || length>50' |
  tr 3-4 ' ' |
  sed 's/^ *//' |
  cat -s |
  sed 's/ / /g' |
  fmt |
  head -n 2000 > /tmp/random-1.txt &&
  tr -dc a-z1-4 </dev/urandom |
  tr 1-2 ' \n' |
  awk 'length==0 || length>50' |
  tr 3-4 ' ' |
  sed 's/^ *//' |
  cat -s |
  sed 's/ / /g' |
  fmt |
  head -n 2000 > /tmp/random-2.txt  &&
  git diff --no-index /tmp/random-1.txt /tmp/random-2.txt |
  diffr

CervEdin avatar Apr 04 '23 06:04 CervEdin

hi @CervEdin

thanks for the report! I have a patch for that, which is very simple. unfortunately, my company does not really allow open source work, so this is stuck there for a while.

mookid avatar Apr 06 '23 20:04 mookid

Hi @mookid

That's great! However, I'm a bit perplexed. Aren't you the maintainer?

CervEdin avatar Apr 19 '23 15:04 CervEdin

I am. I can provide some help to someone motivated to write the code, but can't submit it myself.

The Myers algorithm is incremental: if find the shortest path of length D for each D increasing; my patch just stops after D large-ish (if D is large, the diff will have poor readability annyway) and just falls back to "remove everything, add everything".

mookid avatar Apr 20 '23 02:04 mookid

Fair enough. Maybe I take a look at some point. For now I just switched back the pager of things like git log to diff-highlight

On Thu, Apr 20, 2023, 4:17 AM Nathan Moreau @.***> wrote:

I am. I can provide some help to someone motivated to write the code, but can't submit it myself.

The Myers algorithm is incremental: if find the shortest path of length D for each D increasing; my patch just stops after D large-ish (if D is large, the diff will have poor readability annyway) and just falls back to "remove everything, add everything".

— Reply to this email directly, view it on GitHub https://github.com/mookid/diffr/issues/14#issuecomment-1515616491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXL5ESXXHC5NYXPZZO7DE3XCCMCLANCNFSM4IFNDB6Q . You are receiving this because you were mentioned.Message ID: @.***>

CervEdin avatar Apr 22 '23 12:04 CervEdin

@CervEdin #96 should help with your problem, probably (your script does not work on my system for some reason, I get tr: Illegal byte sequence when running it)

mookid avatar Sep 06 '23 12:09 mookid

@mookid thanks! Maybe the test script I wrote is using something non POSIX. It works in WSL and git-bash.

I was able to test that 9bfc9f264739c2324d7fcd1f66038d06b67ff116 allows using --large-diff-threshold 200 , to skip. But I don't think I saw a mention of the flag in --help.

I'm wasn't able to cargo install --path . in master, geting issues with is_terminal.

error[E0658]: use of unstable library feature 'is_terminal'
 --> src/cli_args.rs:7:5
  |
7 | use std::io::IsTerminal;
  |     ^^^^^^^^^^^^^^^^^^^
  |
  = note: see issue #98070 <https://github.com/rust-lang/rust/issues/98070> for more information

Thank you for adding this feature, it's going to make the tool much more usable, at least for me ♥

CervEdin avatar Oct 15 '23 20:10 CervEdin

I was able to test that 9bfc9f264739c2324d7fcd1f66038d06b67ff116 allows using --large-diff-threshold 200 , to skip. But I don't think I saw a mention of the flag in --help.

no, I made it a 'private' flag for now

I'm wasn't able to cargo install --path . in master, geting issues with is_terminal.

you need a recent-enough version of the rust toolchain, as is_terminal has been stabilised relatively recently; or use the unstable version: cargo +nightly install --path . probably works (not tested)

Thank you for adding this feature, it's going to make the tool much more usable, at least for me ♥

<3

mookid avatar Oct 16 '23 08:10 mookid