git-unite icon indicating copy to clipboard operation
git-unite copied to clipboard

Performance improvement

Open cg110 opened this issue 4 years ago • 2 comments

Hi,

I was using this tool to fixup some directory casing problems, IE: git-unite -d

And found that it was very slow progress. This appears to be due to the code for indexEntries being quite slow, as we've 40-50k files in the index, and paths.

I did a bit of an optimization, and altered the code here: https://github.com/tawman/git-unite/blob/master/src/LibGitUnite/UniteRepository.cs#L156

To

var indexEntries =
                _gitRepository.Index.Where(f =>
                {
                    var lastSeparator = f.Path.LastIndexOf(CharSeparator);

                    if (lastSeparator == -1)
                        return false;

                    var directoryPath = f.Path.Substring(0, lastSeparator);

                    return !foldersFullPathMap.Any(s => s.Contains(directoryPath));

                    //return f.Path.LastIndexOf(Separator, StringComparison.Ordinal) != -1
                    //       &&
                    //       !foldersFullPathMap.Any(s =>
                    //           s.Contains(f.Path.Substring(0,
                    //               f.Path.LastIndexOf(Separator, StringComparison.Ordinal))));
                });

Note that I also added a CharSeparator: private const char CharSeparator = '\';

Testing on an i9-9900k with ssd, git-unite 2.1 takes 6m25s

With the above optimizations it takes 2m26s. I did wonder if more can be done, eg make the foldersFullPathMap only have the git paths in, rather than the full path.

Still it's a 50% reduction.

cg110 avatar Jun 18 '20 12:06 cg110

@cg110 thanks for running some metrics and I will take a look again at optimizing. It has been a long time since this code was written but I know I was checking the LastIndexOf() for a specific directory naming structure to avoid false positives. A simple Contains() might match on a deeper tree.

I will try to generate a 50k file test repo when I get a chance.

tawman avatar Jun 18 '20 13:06 tawman

No probs, it's probably unusual to have such large repos, but now and again someone has the wrong cased directories, and this tool is great for finding that. (generally because someone has an older clone, and the name was re-cased)

I don't think I've changed the logic, I just broken the bits down, so that the .Any() doesn't redo the lastindexof and substring. It was doing s.Contains before.

I'm actually not sure which bit was the main speed up, switching to char, or trying to do each piece of work once.

cg110 avatar Jun 19 '20 09:06 cg110