gitui
gitui copied to clipboard
PoC: Use gitoxide for rev walk
This is a PoC, mainly intended to explore two things:
- how easy/difficult it is to switch
git2forgixfor a single use case, - compare both approaches with respect to performance.
Findings
This version uses repo.rev_walk(). We could also try implementing the existing algorithm using gitx primitives, but I think it makes sense to optimize repo.rev_walk() so that every consumer benefits from improvements.
It seems as if using gix instead of git2 is fairly straightforward. There are a few changes necessary, but none of them major. Also, it seems as if the changes can mostly be contained in LogWalker, in particular without the need to change LogWalker::read’s API. (LogWalker::new needs to slightly be changed, but this does not look like an issue.)
Performance-wise, it seems as if the gix implementation is slower than the git2 implementation by about 30 %. I tested how long it took to open the app in my copy of the Linux kernel until the loading indicator stopped spinning, indicating that the full list of commit ids had been loaded. My copy of the Linux kernel contains 1_014_089 commits.
gix with use_commit_graph(true): about 22 s
gix with use_commit_graph(false): about 22 s
git2 (at commit 038c4a50): about 17 s
Keep in mind that these are very rough numbers. It’s also possible that the gix API can be used in a way that is faster.
gix with features = ["max-performance"]
When using features = ["max-performance"], gix is about 25 % faster on my machine than the implementation based on git2.
gix with use_commit_graph(false): about 13 s
gix with use_commit_graph(true): about 13 s
git2 without hash verification or caching
I added the following lines to repo in asyncgit::sync::repository, right before Repository::open_ext, and it was even faster.
enable_caching(false);
strict_hash_verification(false);
git2 without hash verification or caching: about 10 s
Edit 2024-06-16: I also got flamegraphs for both implementations that I could share.
Edit 2024-06-17: I added numbers for features = ["max-performance"]. I’ll later also add a flamegraph and will test use_commit_graph(true).
Edit 2024-06-18: I added numbers for git2 without hash verification or caching.
Flamegraph using gitx
Flamegraph using gitx with features = ["max-performance"]
Flamegraph using git2
Flamegraph using git2 without hash verification or caching
That's all very exciting, thanks for getting started with gix in gitui :)!
I think it's worth noting that gix does not validate the objects it reads right now, which makes it less safe than git2, but as safe as Git (as far as I could tell).
This git2 behaviour can be deactivated with strict_hash_verification, and maybe more performance can be obtained by disabling object caching.
This git2 behaviour can be deactivated with strict_hash_verification, and maybe more performance can be obtained by disabling object caching.
@cruessler did you ever benchmark this?
@extrawurst Yes, at the time I added the numbers in the first post, under “git2 without hash verification or caching”. It was even faster than gix with max-performance.
That is interesting!
A note on gix with use_commit_graph(true): about 13 s - this would be about 10 times faster if a graph was actually used. It will be used by default (unless disabled in git config) if it was present as well. Was it created with git commit-graph write --reachable?
Something that seems strange here is that the numbers don't seem to match my own.
For instance, a simple commit traversal (that cannot use the commit-graph cache) can be done (with a hot FS-cache) at 138k commits/s on an M1 Pro.
❯ ein t hours
09:50:50 traverse commit graph done 1.3M commits in 9.46s (138.4K commits/s)
09:50:50 estimate-hours Extracted and organized data from 1309152 commits in 11.885291ms (110148920 commits/s)
total hours: 1243723.88
total 8h days: 155465.48
total commits = 1309152
total authors: 34641
total unique authors: 26510 (23.47% duplication)
linux ( master) +798 -408 [!] took 9s
❯ git rev-parse @
87d6aab2389e5ce0197d8257d5f8ee965a67c4cd
That code uses a the Simple iteration directly, which is used through abstractions here. I'd hope that these don't cause such a slowdown.
It would certainly be interesting, @cruessler, to see what ein t hours says on your machine. For a perfect comparison, one would certainly want to write a non-GUI program that does the traversal to be sure the right thing is measured.
Damaged pride aside 😅, I am glad that git2 is this awesome, and that I could help.
To add a bit more context: I’ve created https://github.com/cruessler/gix-benchmarks in order to be able to more thoroughly compare history traversal speed of both gix and git2. It seems that gix is significantly faster, in particular in the Linux kernel. (I hope that I didn’t make a mistake in the benchmark code. :smile:)
Also: I did not know about git commit-graph write --reachable at the time, and the numbers most certainly reflect that. :smile:
The benchmark has strict_object_creation(false); strict_hash_verification(false); https://github.com/cruessler/gix-benchmarks/blob/73711297dbad890da146f113c3d9f7e92f0afac7/src/main.rs#L63-L65.
This is very promising!