libgit2sharp icon indicating copy to clipboard operation
libgit2sharp copied to clipboard

repo.Commits.QueryBy(filename) slow on large repos

Open tster123 opened this issue 6 years ago • 6 comments

Reproduction steps

1): Clone a large repo 2): run this function on that repo with some random file:

public IEnumerable<string> TestSlow(string filename)
{
    using (var repo = new Repository(repoRoot))
    {
        string path = filename.Substring(repoRoot.Length + 1).Replace("\\", "/");
        foreach (LogEntry entry in repo.Commits.QueryBy(path))
        {
            yield return entry.Commit.Author.ToString();
        }
    }
}
  1. run this command on the same file: "git log --follow --oneline -- "

Expected behavior

I expect similar time to be taken by TestSlow and the git log command above

Actual behavior

The git log command finishes in about 1.6 ms on my repo The TestSlow command takes about 70 seconds.

Here is what I see in my profiler: profiler view

Version of LibGit2Sharp (release number or SHA1)

0.27.0-preview-0017 0.26.1 0.24.1

Operating system(s) tested; .NET runtime tested

.NET Framework 4.7.2 on Windows 10

tster123 avatar Aug 20 '19 18:08 tster123

As a note: I tried different sorting options, but was limited because FileHistory doesn't support None or Reverse:

System.ArgumentException: Unsupported sort strategy. Only 'Topological', 'Time', or 'Topological | Time' are allowed.
Parameter name: queryFilter
    at LibGit2Sharp.Core.FileHistory..ctor(Repository repo, String path, CommitFilter queryFilter) in C:\projects\libgit2sharp\LibGit2Sharp\Core\FileHistory.cs:line 76

tster123 avatar Aug 20 '19 18:08 tster123

Just submitted a proposed solution. The change to FileHistory make it run in 14 seconds instead of 80, the change in Tree got it down to about 8 seconds.

I see the continuous integration failed, but that looks like a CI problem, not a problem with my code. Here is the error (linux only):

========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/4d8a90ac-c758-402b-94d3-740f95fba16d.sh
/usr/share/dotnet/sdk/2.2.105/NuGet.targets(499,5): error : Could not find a part of the path '/tmp/NuGetScratch/e31463d7-84e6-4141-aa64-0e5166476164'. [/home/vsts/work/1/s/LibGit2Sharp/LibGit2Sharp.csproj]
##[error]Bash exited with code '1'.
##[section]Finishing: CmdLine

tster123 avatar Aug 20 '19 21:08 tster123

Awesome work! I met same issue and I have to wrap a git.exe and use git log to speed up the log reading.

Blueve avatar Feb 20 '20 04:02 Blueve

@Blueve Could you share your code for parsing the git.exe output?

blackboxlogic avatar Oct 13 '21 15:10 blackboxlogic

@Blueve Could you share your code for parsing the git.exe output?

We read the cmd output from git log command, such as: git --no-pager log --date-order --no-merges --no-renames --pretty=format:@/%H/ --stat=512 -- {0} where the {0} is formatted filter. We can use other command and parameter to satisfied different intent.

And then parse the output string line by line. The format of output combined with below:

<empty line>
<commit sha line, start with @/ and end with />
<file path> | <changed lines>
<file path> | <changed lines>
...
<file path> | <changed lines>

Sorry I couldn't share the full code since they are interval visible only.

Blueve avatar Oct 15 '21 02:10 Blueve