git-confirm icon indicating copy to clipboard operation
git-confirm copied to clipboard

git commit will hang if lots of files are in staging, like when merging master

Open sktguha opened this issue 7 years ago • 15 comments

for now I am using git commit --no-verify . but maybe git confirm can have some kind of logic to skip the check if there are too many files in staging,

sktguha avatar Feb 09 '18 12:02 sktguha

How many files is too many files? How large is the change overall?

pimterry avatar Feb 09 '18 12:02 pimterry

there were like about 12 files or so.

sktguha avatar Feb 09 '18 12:02 sktguha

the change was about 1000 lines of unstaged changes

sktguha avatar Feb 09 '18 12:02 sktguha

That should still work fine, I've done commits of similar sizes before with no problems. How large are the files overall?

pimterry avatar Feb 09 '18 12:02 pimterry

ok let me get you that complete info

sktguha avatar Feb 09 '18 12:02 sktguha

CHARACTERS 88114 WORDS 6335 SENTENCES 1258 PARAGRAPHS 1728 WHITESPACE 23160

sktguha avatar Feb 09 '18 12:02 sktguha

i don;t want to put the git diff file as it is internal repo. but i could replace the characters in the file with some token and post it

sktguha avatar Feb 09 '18 12:02 sktguha

Is that the size of the diff, or the files themselves?

Number of lines would be useful too.

pimterry avatar Feb 09 '18 12:02 pimterry

the size of the diff. 1712 lines

sktguha avatar Feb 09 '18 12:02 sktguha

mydiff.txt

sktguha avatar Feb 09 '18 12:02 sktguha

Hmm, even that should be fine. What git-confirm actually does is:

  • For each pattern, for each changed file, get the diff:

    changes=$(git diff -U999999999 -p --cached --color=always -- $filename)
    
  • Grep that diff for added/changed lines:

    echo $changes | grep -C4 $'^\e\\[32m\+.*'"$match_pattern"
    
  • (and then report any matches)

The interesting question is which part of that is slow for you, and why. If we know that, we can work out how to detect this case quickly and reliably, and then do something about it (or stop it being slow in the first place).

If I take the diff you've uploaded there, and git add it in a fresh repo, doing this is pretty much instant, so that alone isn't the problem. I can't actually remember why we include the -U999999999 here, and that might get expensive with very large files only containing a few changes, since we're effectively grepping the whole thing. Could that be the problem?

There's also generally some duplication here, I don't think there's any good reason not to do the diff once per file and then grep repeatedly, rather than repeating everything for every pattern. Do you have a very long list of custom patterns configured?

pimterry avatar Feb 09 '18 12:02 pimterry

ah no.

#!/bin/sh

. git-sh-setup # for die git-diff-index -p -M --cached HEAD -- | grep '^+' | grep @#rbc && die Blocking commit because string @#rbc detected in patch

sktguha avatar Feb 09 '18 12:02 sktguha

so i can run git confirm sh seperately then i guess to see where the bottleneck is

sktguha avatar Feb 09 '18 12:02 sktguha

could u give me an sh file which outputs the start time and end time after each operation , so then i can see where it is taking time ?

sktguha avatar Feb 09 '18 12:02 sktguha

The real code is here: https://github.com/pimterry/git-confirm/blob/master/hook.sh#L50-L55

You can add date on line 50, 52 & 55 to get printed timestamps before & after each step.

If you've got your repo in the right state (all your changes added, ready to commit) you should also be able to just run the commands above with the appropriate $filename and $match_pattern to reproduce this.

pimterry avatar Feb 09 '18 13:02 pimterry