gix index from-tree
TODOs
- [x] write an index file based on a tree object id
- [x] performance test on big repos (find or create one)
- [ ] ~~look at progress reporting if necessary~~
Git
| . | TREE | TREE PREFIX | COMMIT | COMMIT PREFIX | TREE WITH FILE OUTPUT |
|---|---|---|---|---|---|
| Gix | 9.9 ms | 9.4 ms | 9.4 ms | 9.5 ms | 100.2 ms |
| Git | 11.6 ms | 11.5 ms | 11.7 ms | 11.5 ms | 12.9 ms |
Rust
| . | TREE | TREE PREFIX | COMMIT | COMMIT PREFIX | TREE WITH FILE OUTPUT |
|---|---|---|---|---|---|
| Gix | 171.7 ms | 171.6 ms | 172.0 ms | 172.6 ms | 1.002 s |
| Git | 67.1 ms | 66.2 ms | 65.1 ms | 64.6 ms | 83.6 ms |
Linux
| . | TREE | TREE PREFIX | COMMIT | COMMIT PREFIX | TREE WITH FILE OUTPUT |
|---|---|---|---|---|---|
| Gix | 2.070 s | 2.059 s | 2.061 s | 2.054 | 3.779 s |
| Git | 141.8 ms | 143.7 ms | 142.9 ms | 143.5 ms | 181.8 ms |
Benchmark Script
#!/bin/bash
gix_benchmark() {
echo "Running gix benchmark on repository: \"$1\""
echo
(
cd "$1" || return
head_commit=$(git rev-parse HEAD)
head_commit_prefix=${head_commit:0:7}
tree_id=$(git rev-parse '@^{tree}')
tree_id_prefix=${tree_id:0:7}
hyperfine --warmup 10 --runs "$2" "gix index from-tree $tree_id"
hyperfine --warmup 10 --runs "$2" "gix index from-tree $tree_id_prefix"
hyperfine --warmup 10 --runs "$2" "gix index from-tree $head_commit"
hyperfine --warmup 10 --runs "$2" "gix index from-tree $head_commit_prefix"
hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "gix index from-tree $tree_id -i ../test.index"
)
echo
echo "----------------------------------------------------------------"
echo
}
git_benchmark() {
echo "Running git benchmark on repository: \"$1\""
echo
(
cd "$1" || return
head_commit=$(git rev-parse HEAD)
head_commit_prefix=${head_commit:0:7}
tree_id=$(git rev-parse '@^{tree}')
tree_id_prefix=${tree_id:0:7}
hyperfine --warmup 10 --runs "$2" "git read-tree -n $tree_id"
hyperfine --warmup 10 --runs "$2" "git read-tree -n $tree_id_prefix"
hyperfine --warmup 10 --runs "$2" "git read-tree -n $head_commit"
hyperfine --warmup 10 --runs "$2" "git read-tree -n $head_commit_prefix"
hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "git read-tree $tree_id --index-output=../test.index"
)
echo
echo "----------------------------------------------------------------"
echo
}
gix_benchmark git 250
git_benchmark git 250
gix_benchmark rust 150
git_benchmark rust 150
gix_benchmark linux 150
git_benchmark linux 150
Output
Running gix benchmark on repository: "git"
Benchmark 1: gix index from-tree cf23c192d11315613a05b7bbdd3e142182bc1749
Time (mean ± σ): 9.9 ms ± 2.1 ms [User: 7.9 ms, System: 3.2 ms]
Range (min … max): 8.9 ms … 38.0 ms 250 runs
Benchmark 1: gix index from-tree cf23c19
Time (mean ± σ): 9.4 ms ± 0.4 ms [User: 7.9 ms, System: 3.2 ms]
Range (min … max): 8.9 ms … 11.9 ms 250 runs
Benchmark 1: gix index from-tree 30cc8d0f147546d4dd77bf497f4dec51e7265bd8
Time (mean ± σ): 9.4 ms ± 0.4 ms [User: 7.9 ms, System: 3.2 ms]
Range (min … max): 9.0 ms … 13.3 ms 250 runs
Benchmark 1: gix index from-tree 30cc8d0
Time (mean ± σ): 9.5 ms ± 0.4 ms [User: 7.9 ms, System: 3.2 ms]
Range (min … max): 9.1 ms … 11.1 ms 250 runs
Benchmark 1: gix index from-tree cf23c192d11315613a05b7bbdd3e142182bc1749 -i ../test.index
Time (mean ± σ): 100.2 ms ± 2.7 ms [User: 13.1 ms, System: 88.3 ms]
Range (min … max): 97.0 ms … 128.6 ms 250 runs
----------------------------------------------------------------
Running git benchmark on repository: "git"
Benchmark 1: git read-tree -n cf23c192d11315613a05b7bbdd3e142182bc1749
Time (mean ± σ): 11.6 ms ± 0.9 ms [User: 8.1 ms, System: 2.3 ms]
Range (min … max): 11.2 ms … 25.5 ms 250 runs
Benchmark 1: git read-tree -n cf23c19
Time (mean ± σ): 11.5 ms ± 0.4 ms [User: 8.1 ms, System: 2.4 ms]
Range (min … max): 11.2 ms … 14.6 ms 250 runs
Benchmark 1: git read-tree -n 30cc8d0f147546d4dd77bf497f4dec51e7265bd8
Time (mean ± σ): 11.7 ms ± 1.6 ms [User: 8.1 ms, System: 2.4 ms]
Range (min … max): 11.2 ms … 31.9 ms 250 runs
Benchmark 1: git read-tree -n 30cc8d0
Time (mean ± σ): 11.5 ms ± 0.2 ms [User: 8.1 ms, System: 2.4 ms]
Range (min … max): 11.1 ms … 12.7 ms 250 runs
Benchmark 1: git read-tree cf23c192d11315613a05b7bbdd3e142182bc1749 --index-output=../test.index
Time (mean ± σ): 12.9 ms ± 0.9 ms [User: 9.2 ms, System: 2.6 ms]
Range (min … max): 12.3 ms … 26.1 ms 250 runs
----------------------------------------------------------------
Running gix benchmark on repository: "rust"
Benchmark 1: gix index from-tree 28b9a7976fd3103a1a2b087a1fac01e3519db8d4
Time (mean ± σ): 171.7 ms ± 4.5 ms [User: 165.3 ms, System: 8.1 ms]
Range (min … max): 168.8 ms … 218.4 ms 150 runs
Benchmark 1: gix index from-tree 28b9a79
Time (mean ± σ): 171.6 ms ± 1.7 ms [User: 165.2 ms, System: 8.1 ms]
Range (min … max): 169.2 ms … 178.4 ms 150 runs
Benchmark 1: gix index from-tree cd4d9d934fd3bc1b6a0b0fcb3548a1b26fc53c9d
Time (mean ± σ): 172.0 ms ± 1.3 ms [User: 165.7 ms, System: 8.0 ms]
Range (min … max): 170.0 ms … 178.3 ms 150 runs
Benchmark 1: gix index from-tree cd4d9d9
Time (mean ± σ): 172.6 ms ± 1.7 ms [User: 166.1 ms, System: 8.0 ms]
Range (min … max): 170.1 ms … 185.1 ms 150 runs
Benchmark 1: gix index from-tree 28b9a7976fd3103a1a2b087a1fac01e3519db8d4 -i ../test.index
Time (mean ± σ): 1.002 s ± 0.018 s [User: 0.213 s, System: 0.783 s]
Range (min … max): 0.972 s … 1.073 s 150 runs
----------------------------------------------------------------
Running git benchmark on repository: "rust"
Benchmark 1: git read-tree -n 28b9a7976fd3103a1a2b087a1fac01e3519db8d4
Time (mean ± σ): 67.1 ms ± 5.9 ms [User: 58.6 ms, System: 5.9 ms]
Range (min … max): 62.5 ms … 98.0 ms 150 runs
Benchmark 1: git read-tree -n 28b9a79
Time (mean ± σ): 66.2 ms ± 4.5 ms [User: 58.2 ms, System: 5.9 ms]
Range (min … max): 62.2 ms … 100.6 ms 150 runs
Benchmark 1: git read-tree -n cd4d9d934fd3bc1b6a0b0fcb3548a1b26fc53c9d
Time (mean ± σ): 65.1 ms ± 2.6 ms [User: 57.3 ms, System: 5.7 ms]
Range (min … max): 62.7 ms … 81.1 ms 150 runs
Benchmark 1: git read-tree -n cd4d9d9
Time (mean ± σ): 64.6 ms ± 2.4 ms [User: 57.2 ms, System: 5.6 ms]
Range (min … max): 62.4 ms … 78.2 ms 150 runs
Benchmark 1: git read-tree 28b9a7976fd3103a1a2b087a1fac01e3519db8d4 --index-output=../test.index
Time (mean ± σ): 83.6 ms ± 4.4 ms [User: 67.2 ms, System: 6.9 ms]
Range (min … max): 73.8 ms … 103.0 ms 150 runs
----------------------------------------------------------------
Running gix benchmark on repository: "linux"
Benchmark 1: gix index from-tree c56e40167a7531d26d19433566533b7510a6f9ba
Time (mean ± σ): 2.070 s ± 0.031 s [User: 2.042 s, System: 0.023 s]
Range (min … max): 2.035 s … 2.238 s 150 runs
Benchmark 1: gix index from-tree c56e401
Time (mean ± σ): 2.059 s ± 0.016 s [User: 2.036 s, System: 0.023 s]
Range (min … max): 2.036 s … 2.145 s 150 runs
Benchmark 1: gix index from-tree a1375562c0a87f0fa2eaf3e8ce15824696d4170a
Time (mean ± σ): 2.061 s ± 0.026 s [User: 2.037 s, System: 0.022 s]
Range (min … max): 2.035 s … 2.211 s 150 runs
Benchmark 1: gix index from-tree a137556
Time (mean ± σ): 2.054 s ± 0.013 s [User: 2.033 s, System: 0.022 s]
Range (min … max): 2.036 s … 2.120 s 150 runs
Benchmark 1: gix index from-tree c56e40167a7531d26d19433566533b7510a6f9ba -i ../test.index
Time (mean ± σ): 3.779 s ± 0.032 s [User: 2.129 s, System: 1.634 s]
Range (min … max): 3.707 s … 3.968 s 150 runs
----------------------------------------------------------------
Running git benchmark on repository: "linux"
Benchmark 1: git read-tree -n c56e40167a7531d26d19433566533b7510a6f9ba
Time (mean ± σ): 141.8 ms ± 1.6 ms [User: 124.2 ms, System: 15.5 ms]
Range (min … max): 139.6 ms … 148.8 ms 150 runs
Benchmark 1: git read-tree -n c56e401
Time (mean ± σ): 143.7 ms ± 6.0 ms [User: 125.4 ms, System: 15.8 ms]
Range (min … max): 139.8 ms … 188.8 ms 150 runs
Benchmark 1: git read-tree -n a1375562c0a87f0fa2eaf3e8ce15824696d4170a
Time (mean ± σ): 142.9 ms ± 3.6 ms [User: 124.8 ms, System: 15.6 ms]
Range (min … max): 139.6 ms … 163.5 ms 150 runs
Benchmark 1: git read-tree -n a137556
Time (mean ± σ): 143.5 ms ± 2.8 ms [User: 125.5 ms, System: 15.8 ms]
Range (min … max): 139.5 ms … 157.1 ms 150 runs
Benchmark 1: git read-tree c56e40167a7531d26d19433566533b7510a6f9ba --index-output=../test.index
Time (mean ± σ): 181.8 ms ± 13.0 ms [User: 152.1 ms, System: 17.6 ms]
Range (min … max): 170.2 ms … 254.4 ms 150 runs
----------------------------------------------------------------
Thanks for the benchmarks! I am a big fan of those! When looking at the performance difference I can only only hope that gix was compiled in debug mode. But we don't get off that easily, right? The file-output performance is probably lacking because it's missing a BufWriter.
You can experiment with caches and see if these improve performance. gix responds to these environment variables which makes it easy to play around with.
Last but not least, Instruments time-profiler is the tool of choice to see where time is spent. My feeling is the tree-traversal is terribly slow, even though I never experienced it to be that bad, so I'd hope there is other issues causing this. I am looking at Linux in particular when saying this, as 'bad' gitoxide performance is usually in the 30% slower than git range, and not 1500% slower 😅.
Actually, maybe it's exponential behavior in one of the algorithms, since it gets so much worse the bigger the tree. Wasn't there as retain() somewhere which I thought 'should be fine', but now it isn't?
Git
| . | No File Output | With File Output |
|---|---|---|
| Gix | 5.3 ms | 5.3 ms |
| Git | 13.4 ms | 13.8 ms |
Rust
| . | No File Output | With File Output |
|---|---|---|
| Gix | 32.4 ms | 32.3 ms |
| Git | 67.6 ms | 80.2 ms |
Linux
| . | No File Output | With File Output |
|---|---|---|
| Gix | 60.1 ms | 62.0 ms |
| Git | 149.5 ms | 184.3 ms |
Benchmark Script
#!/bin/bash
gix_benchmark() {
echo "Running gix benchmark on repository: \"$1\""
echo
(
cd "$1" || return
hyperfine --warmup 10 --runs "$2" "gix index from-tree '@^{tree}'"
hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "gix index from-tree '@^{tree}' -i ../test.index"
)
echo
echo "----------------------------------------------------------------"
echo
}
git_benchmark() {
echo "Running git benchmark on repository: \"$1\""
echo
(
cd "$1" || return
hyperfine --warmup 10 --runs "$2" "git read-tree -n '@^{tree}'"
hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "git read-tree '@^{tree}' --index-output=../test.index"
)
echo
echo "----------------------------------------------------------------"
echo
}
gix_benchmark git 150
git_benchmark git 150
gix_benchmark rust 100
git_benchmark rust 100
gix_benchmark linux 50
git_benchmark linux 50
Output
Running gix benchmark on repository: "git"
Benchmark 1: gix index from-tree '@^{tree}'
Time (mean ± σ): 5.3 ms ± 0.4 ms [User: 3.9 ms, System: 2.9 ms]
Range (min … max): 4.8 ms … 7.3 ms 150 runs
Benchmark 1: gix index from-tree '@^{tree}' -i ../test.index
Time (mean ± σ): 5.3 ms ± 0.8 ms [User: 3.7 ms, System: 3.2 ms]
Range (min … max): 4.7 ms … 11.0 ms 150 runs
----------------------------------------------------------------
Running git benchmark on repository: "git"
Benchmark 1: git read-tree -n '@^{tree}'
Time (mean ± σ): 13.4 ms ± 2.7 ms [User: 9.1 ms, System: 2.8 ms]
Range (min … max): 11.6 ms … 35.7 ms 150 runs
Benchmark 1: git read-tree '@^{tree}' --index-output=../test.index
Time (mean ± σ): 13.8 ms ± 0.8 ms [User: 9.8 ms, System: 2.8 ms]
Range (min … max): 12.7 ms … 19.0 ms 150 runs
----------------------------------------------------------------
Running gix benchmark on repository: "rust"
Benchmark 1: gix index from-tree '@^{tree}'
Time (mean ± σ): 32.4 ms ± 1.9 ms [User: 25.7 ms, System: 8.0 ms]
Range (min … max): 30.7 ms … 46.0 ms 100 runs
Benchmark 1: gix index from-tree '@^{tree}' -i ../test.index
Time (mean ± σ): 32.3 ms ± 3.6 ms [User: 21.0 ms, System: 10.3 ms]
Range (min … max): 28.7 ms … 60.0 ms 100 runs
----------------------------------------------------------------
Running git benchmark on repository: "rust"
Benchmark 1: git read-tree -n '@^{tree}'
Time (mean ± σ): 67.6 ms ± 4.9 ms [User: 59.3 ms, System: 6.3 ms]
Range (min … max): 64.1 ms … 95.1 ms 100 runs
Benchmark 1: git read-tree '@^{tree}' --index-output=../test.index
Time (mean ± σ): 80.2 ms ± 5.3 ms [User: 69.9 ms, System: 7.3 ms]
Range (min … max): 75.9 ms … 101.9 ms 100 runs
----------------------------------------------------------------
Running gix benchmark on repository: "linux"
Benchmark 1: gix index from-tree '@^{tree}'
Time (mean ± σ): 60.1 ms ± 2.7 ms [User: 45.3 ms, System: 15.5 ms]
Range (min … max): 58.5 ms … 72.0 ms 50 runs
Benchmark 1: gix index from-tree '@^{tree}' -i ../test.index
Time (mean ± σ): 62.0 ms ± 6.1 ms [User: 36.9 ms, System: 20.2 ms]
Range (min … max): 54.7 ms … 87.0 ms 50 runs
----------------------------------------------------------------
Running git benchmark on repository: "linux"
Benchmark 1: git read-tree -n '@^{tree}'
Time (mean ± σ): 149.5 ms ± 9.7 ms [User: 129.8 ms, System: 16.8 ms]
Range (min … max): 141.8 ms … 185.8 ms 50 runs
Benchmark 1: git read-tree '@^{tree}' --index-output=../test.index
Time (mean ± σ): 184.3 ms ± 13.7 ms [User: 158.5 ms, System: 19.2 ms]
Range (min … max): 172.8 ms … 248.3 ms 50 runs
----------------------------------------------------------------
That's awesome! This tells us that writing the index costs nearly nothing if buffered, not entirely unsurprising, but surprising how fast it actually is. And whatever git is doing there costs it dearly, and as far as I know it only creates the TREE extension on top of what we are doing. And that data, you already have so it's strange it costs so much.
Maybe it's just that much more efficient, maybe they do .insert() into a vector after all which just happens to be way faster than what we have been doing.
Do you think it's ready for review? If so, you can signal me by changing the PR state accordingly.
Thanks a lot!
I made a few adjustments before merging in case you want to have a look.
It turns out the indices written weren't valid due to missing trailer, the missing tests and API changes will be coming in via #556 .
git is way slower in traversing trees than gitoxide seemingly is, but gains points when writing the index back. Didn't check, but maybe it's created fully in memory whereas we do a couple more write calls thanks to small buffer sizes?
Apparently git keeps additional caches when traversing trees and that costs even though the caches aren't useful here. It's similar if we turn on an object cache, which reduces performance measurably as there is no cache hit (just costs of maintaining the cache). Nice to have that level of control and turn caches on only when they are always beneficial.