onefetch icon indicating copy to clipboard operation
onefetch copied to clipboard

Extremely slow with large repositories.

Open AnalogFeelings opened this issue 2 years ago • 13 comments

This program is very slow when running it on large repositories, for example ReactOS.

AnalogFeelings avatar Mar 20 '22 15:03 AnalogFeelings

@o2sh Should we revisit this?

spenserblack avatar Mar 24 '22 12:03 spenserblack

Sure, but after #211 and #309, I am running out of ideas. 😞

ping @CephalonRho @yoichi @HallerPatrick

o2sh avatar Mar 24 '22 14:03 o2sh

A lot of time still appears to be spent reading commit information in Repo::get_logs, most of which is only used by Repo::get_authors.

Reading commits in parallel seems like a way to speed this up a bit, but I'm not sure if this is possible with libgit2. git-repository from gitoxide seems like a promising alternative.

Otherwise some form of caching could work, but I'm not sure if this is a good idea.

shuni64 avatar Mar 25 '22 14:03 shuni64

gitoxide is indeed a promising alternative. It might be an idea worth exploring even it means losing some features like mailmap.

o2sh avatar Mar 25 '22 21:03 o2sh

mailmap is on their radar, at least. I guess we can open an issue to show them that there's interest for it 😆

If we do drop .mailmap support, reopening #447, we should probably make a release before we drop support. That way users that have used the .mailmap feature (#596) will have a release that's as up-to-date as possible before the breaking change.

spenserblack avatar Mar 25 '22 22:03 spenserblack

I created the issue https://github.com/Byron/gitoxide/issues/363.

If we do drop .mailmap support, reopening https://github.com/o2sh/onefetch/issues/447, we should probably make a release before we drop support. That way users that have used the .mailmap feature (https://github.com/o2sh/onefetch/issues/596) will have a release that's as up-to-date as possible before the breaking change.

That's a perfect plan 💯

o2sh avatar Mar 25 '22 23:03 o2sh

Reading commits in parallel seems like a way to speed this up a bit, but I'm not sure if this is possible with libgit2. git-repository from gitoxide seems like a promising alternative.

It looks like major speedups are possible even on a single thread for commit graph traversal, for example I'd expect onefetch to go from ~19s on the v5.16 linux kernel checkout to something like 11s.

If more/all operations are done in parallel, like the syntax analysis, it should become as fast as the slowest of these operations, the commit graph traversal which clocks in at about ~7s.

I wouldn't know how to parallelize the commit graph traversal though - the only way I can imagine this to work is to traverse different branches on multiple threads. This usually comes at the overhead of avoiding them to do duplicate work which requires a parallel hashset (like dashmap) which will limit the amount of threads that are effective there. That also depends on the graph, a linear history with a single trunk can't be sped up at all. Anyway, it's sounds like an interesting task to implement high-performance parallel traversal, maybe gitoxide can provide one once all other options are exhausted here.

Screen Shot 2022-03-26 at 09 15 31

The above is my profiling run on the linux kernel v5.16. Most of it is the commit graph traversal, the spike towards the end is tokei, and there is about ~1s of releasing memory (which can and should probably be avoided with process::abort() or by leaking the values with mem::forget()).

All in all, I think switching to gitoxide and running the syntax analysis alone in parallel, along with process::abort() to avoid memory deallocation at the end, one should be able to get roughly 2x the speed on the linux kernel.

Byron avatar Mar 26 '22 01:03 Byron

Thanks a lot for your input @Byron, I've created two issues as a follow up #628 and #629.

o2sh avatar Mar 26 '22 19:03 o2sh

I couldn't resist to do a quick measurement on reactos which seems tame compared to the linux kernel. The numbers, however, are even more promising.

Screen Shot 2022-03-27 at 08 14 38

It seems that ~1.5s are spent on the commit graph traversal, a task which could be accomplished in ~0.4s with gitoxide.

Screen Shot 2022-03-27 at 08 15 09

Interestingly tokei finished in about ~0.5s so it appears that if both would run in parallel, one should finish the onefetch invocation in about 0.5s, down from ~1.5s . Can't wait to see this happen!

Byron avatar Mar 27 '22 00:03 Byron

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

o2sh avatar Jul 10 '22 00:07 o2sh

For completeness, here is the final values, before…

 /usr/bin/time -lp onefetch-pre-gitoxide
                 ++++++                    Sebastian Thiel ~ git version 2.32.1 (Apple Git-133)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: reactos (57 branches, 274 tags)
       ++++++++++++++++++++++++++          HEAD: 2ba6b09 (master, origin/master)
    ++++++++++++++++++++++++++++++++       Version: 0.4.14-release
 +++++++++++++************+++++++++++++    Created: 26 years ago
+++++++++++******************++++++++;;;   Languages:
+++++++++**********************++;;;;;;;              ● C (89.6 %) ● C++ (9.4 %)
++++++++*********++++++******;;;;;;;;;;;              ● CMake (0.6 %) ● Python (0.2 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● JavaScript (0.1 %) ● HTML (0.0 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Other (0.1 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;   Authors: 8% Amine Khaldi 6777
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;            7% Timo Kreuzer 5536
+++++++********::::::::::**;;;;;;;;;;;;;            6% Eric Kohl 4808
++++++++*********::::::******;;;;;;;;;;;   Last change: 3 months ago
++++++:::**********************::;;;;;;;   Contributors: 361
+++::::::::******************::::::::;;;   Repo: https://github.com/reactos/reactos
 :::::::::::::************:::::::::::::    Commits: 81668
    ::::::::::::::::::::::::::::::::       Lines of code: 4900543
       ::::::::::::::::::::::::::          Size: 401.85 MiB (26712 files)
          ::::::::::::::::::::             License: BSD-2-Clause-Views, GPL-2.0-only, GPL-3.0-only, LGPL-2.1-only, LGPL-3.0-only
              ::::::::::::
                 ::::::

real 1.61
user 2.56
sys 0.78
           196935680  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               12570  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   7  voluntary context switches
               10044  involuntary context switches
         32654906025  instructions retired
          9907271625  cycles elapsed
           113985408  peak memory footprint

…in 1.61s with a 1.13 GB memory footprint, and after…

❯ /usr/bin/time -lp onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.32.1 (Apple Git-133)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: reactos (57 branches, 274 tags)
       ++++++++++++++++++++++++++          HEAD: 2ba6b097543 (master, origin/master)
    ++++++++++++++++++++++++++++++++       Version: 0.4.14-release
 +++++++++++++************+++++++++++++    Created: 26 years ago
+++++++++++******************++++++++;;;   Languages:
+++++++++**********************++;;;;;;;              ● C (89.6 %) ● C++ (9.4 %)
++++++++*********++++++******;;;;;;;;;;;              ● CMake (0.6 %) ● Python (0.2 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● JavaScript (0.1 %) ● HTML (0.0 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Other (0.1 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;   Authors: 8% Amine Khaldi 6777
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;            7% Timo Kreuzer 5536
+++++++********::::::::::**;;;;;;;;;;;;;            6% Eric Kohl 4808
++++++++*********::::::******;;;;;;;;;;;   Last change: 3 months ago
++++++:::**********************::;;;;;;;   Contributors: 361
+++::::::::******************::::::::;;;   Repo: https://github.com/reactos/reactos
 :::::::::::::************:::::::::::::    Commits: 81668
    ::::::::::::::::::::::::::::::::       Lines of code: 4900543
       ::::::::::::::::::::::::::          Size: 401.85 MiB (26712 files)
          ::::::::::::::::::::             License: BSD-2-Clause-Views, GPL-2.0-only, GPL-3.0-only, LGPL-2.1-only, LGPL-3.0-only
              ::::::::::::
                 ::::::

real 0.62
user 2.23
sys 0.77
           127516672  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                8541  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   8  voluntary context switches
               10711  involuntary context switches
         26983149694  instructions retired
          8683590898  cycles elapsed
            43878208  peak memory footprint

…in 0.62s with a 438 MB memory footprint.

Byron avatar Jul 10 '22 00:07 Byron

@Byron 😮 ❤️

BTW, do you have an ETA for:

  • [ ] config 'user.name', remote information
  • [ ] git status/pending changes

tracking issue --> https://github.com/Byron/gitoxide/issues/364

o2sh avatar Jul 10 '22 09:07 o2sh

config 'user.name', remote information

This will probably be available this month as I am currently working hard to get git-config a big step closer to 1.0. All the building blocks are there already, so this can happen earlier with a few more lines of code in onefetch.

git status/pending changes

This one is further away, as this year is entirely dedicated to cloning related issues and integration into cargo, which doesn't yet involved worktree status. That said, if it goes well I will use the extra time to get it ready earlier.

Byron avatar Jul 10 '22 10:07 Byron

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

o2sh avatar Oct 16 '22 00:10 o2sh