gitoxide icon indicating copy to clipboard operation
gitoxide copied to clipboard

gix index from-tree

Open SidneyDouw opened this issue 3 years ago • 5 comments

TODOs

  • [x] write an index file based on a tree object id
  • [x] performance test on big repos (find or create one)
  • [ ] ~~look at progress reporting if necessary~~

SidneyDouw avatar Sep 15 '22 14:09 SidneyDouw

Git

. TREE TREE PREFIX COMMIT COMMIT PREFIX TREE WITH FILE OUTPUT
Gix 9.9 ms 9.4 ms 9.4 ms 9.5 ms 100.2 ms
Git 11.6 ms 11.5 ms 11.7 ms 11.5 ms 12.9 ms

Rust

. TREE TREE PREFIX COMMIT COMMIT PREFIX TREE WITH FILE OUTPUT
Gix 171.7 ms 171.6 ms 172.0 ms 172.6 ms 1.002 s
Git 67.1 ms 66.2 ms 65.1 ms 64.6 ms 83.6 ms

Linux

. TREE TREE PREFIX COMMIT COMMIT PREFIX TREE WITH FILE OUTPUT
Gix 2.070 s 2.059 s 2.061 s 2.054 3.779 s
Git 141.8 ms 143.7 ms 142.9 ms 143.5 ms 181.8 ms

Benchmark Script

#!/bin/bash

gix_benchmark() {
    echo "Running gix benchmark on repository: \"$1\""
    echo
    (
        cd "$1" || return

        head_commit=$(git rev-parse HEAD)
        head_commit_prefix=${head_commit:0:7}

        tree_id=$(git rev-parse '@^{tree}')
        tree_id_prefix=${tree_id:0:7}

        hyperfine --warmup 10 --runs "$2" "gix index from-tree $tree_id"
        hyperfine --warmup 10 --runs "$2" "gix index from-tree $tree_id_prefix"
        hyperfine --warmup 10 --runs "$2" "gix index from-tree $head_commit"
        hyperfine --warmup 10 --runs "$2" "gix index from-tree $head_commit_prefix"

        hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "gix index from-tree $tree_id -i ../test.index"
    )
    echo
    echo "----------------------------------------------------------------"
    echo
}

git_benchmark() {
    echo "Running git benchmark on repository: \"$1\""
    echo
    (
        cd "$1" || return

        head_commit=$(git rev-parse HEAD)
        head_commit_prefix=${head_commit:0:7}

        tree_id=$(git rev-parse '@^{tree}')
        tree_id_prefix=${tree_id:0:7}

        hyperfine --warmup 10 --runs "$2" "git read-tree -n $tree_id"
        hyperfine --warmup 10 --runs "$2" "git read-tree -n $tree_id_prefix"
        hyperfine --warmup 10 --runs "$2" "git read-tree -n $head_commit"
        hyperfine --warmup 10 --runs "$2" "git read-tree -n $head_commit_prefix"

        hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "git read-tree $tree_id --index-output=../test.index"
    )
    echo
    echo "----------------------------------------------------------------"
    echo
}

gix_benchmark git 250
git_benchmark git 250

gix_benchmark rust 150
git_benchmark rust 150

gix_benchmark linux 150
git_benchmark linux 150

Output

Running gix benchmark on repository: "git"

Benchmark 1: gix index from-tree cf23c192d11315613a05b7bbdd3e142182bc1749
  Time (mean ± σ):       9.9 ms ±   2.1 ms    [User: 7.9 ms, System: 3.2 ms]
  Range (min … max):     8.9 ms …  38.0 ms    250 runs
 
Benchmark 1: gix index from-tree cf23c19
  Time (mean ± σ):       9.4 ms ±   0.4 ms    [User: 7.9 ms, System: 3.2 ms]
  Range (min … max):     8.9 ms …  11.9 ms    250 runs
 
Benchmark 1: gix index from-tree 30cc8d0f147546d4dd77bf497f4dec51e7265bd8
  Time (mean ± σ):       9.4 ms ±   0.4 ms    [User: 7.9 ms, System: 3.2 ms]
  Range (min … max):     9.0 ms …  13.3 ms    250 runs
 
Benchmark 1: gix index from-tree 30cc8d0
  Time (mean ± σ):       9.5 ms ±   0.4 ms    [User: 7.9 ms, System: 3.2 ms]
  Range (min … max):     9.1 ms …  11.1 ms    250 runs
 
Benchmark 1: gix index from-tree cf23c192d11315613a05b7bbdd3e142182bc1749 -i ../test.index
  Time (mean ± σ):     100.2 ms ±   2.7 ms    [User: 13.1 ms, System: 88.3 ms]
  Range (min … max):    97.0 ms … 128.6 ms    250 runs
 

----------------------------------------------------------------

Running git benchmark on repository: "git"

Benchmark 1: git read-tree -n cf23c192d11315613a05b7bbdd3e142182bc1749
  Time (mean ± σ):      11.6 ms ±   0.9 ms    [User: 8.1 ms, System: 2.3 ms]
  Range (min … max):    11.2 ms …  25.5 ms    250 runs
 
Benchmark 1: git read-tree -n cf23c19
  Time (mean ± σ):      11.5 ms ±   0.4 ms    [User: 8.1 ms, System: 2.4 ms]
  Range (min … max):    11.2 ms …  14.6 ms    250 runs
 
Benchmark 1: git read-tree -n 30cc8d0f147546d4dd77bf497f4dec51e7265bd8
  Time (mean ± σ):      11.7 ms ±   1.6 ms    [User: 8.1 ms, System: 2.4 ms]
  Range (min … max):    11.2 ms …  31.9 ms    250 runs
 
Benchmark 1: git read-tree -n 30cc8d0
  Time (mean ± σ):      11.5 ms ±   0.2 ms    [User: 8.1 ms, System: 2.4 ms]
  Range (min … max):    11.1 ms …  12.7 ms    250 runs
 
Benchmark 1: git read-tree cf23c192d11315613a05b7bbdd3e142182bc1749 --index-output=../test.index
  Time (mean ± σ):      12.9 ms ±   0.9 ms    [User: 9.2 ms, System: 2.6 ms]
  Range (min … max):    12.3 ms …  26.1 ms    250 runs
 

----------------------------------------------------------------

Running gix benchmark on repository: "rust"

Benchmark 1: gix index from-tree 28b9a7976fd3103a1a2b087a1fac01e3519db8d4
  Time (mean ± σ):     171.7 ms ±   4.5 ms    [User: 165.3 ms, System: 8.1 ms]
  Range (min … max):   168.8 ms … 218.4 ms    150 runs
 
Benchmark 1: gix index from-tree 28b9a79
  Time (mean ± σ):     171.6 ms ±   1.7 ms    [User: 165.2 ms, System: 8.1 ms]
  Range (min … max):   169.2 ms … 178.4 ms    150 runs
 
Benchmark 1: gix index from-tree cd4d9d934fd3bc1b6a0b0fcb3548a1b26fc53c9d
  Time (mean ± σ):     172.0 ms ±   1.3 ms    [User: 165.7 ms, System: 8.0 ms]
  Range (min … max):   170.0 ms … 178.3 ms    150 runs
 
Benchmark 1: gix index from-tree cd4d9d9
  Time (mean ± σ):     172.6 ms ±   1.7 ms    [User: 166.1 ms, System: 8.0 ms]
  Range (min … max):   170.1 ms … 185.1 ms    150 runs
 
Benchmark 1: gix index from-tree 28b9a7976fd3103a1a2b087a1fac01e3519db8d4 -i ../test.index
  Time (mean ± σ):      1.002 s ±  0.018 s    [User: 0.213 s, System: 0.783 s]
  Range (min … max):    0.972 s …  1.073 s    150 runs
 

----------------------------------------------------------------

Running git benchmark on repository: "rust"

Benchmark 1: git read-tree -n 28b9a7976fd3103a1a2b087a1fac01e3519db8d4
  Time (mean ± σ):      67.1 ms ±   5.9 ms    [User: 58.6 ms, System: 5.9 ms]
  Range (min … max):    62.5 ms …  98.0 ms    150 runs
 
Benchmark 1: git read-tree -n 28b9a79
  Time (mean ± σ):      66.2 ms ±   4.5 ms    [User: 58.2 ms, System: 5.9 ms]
  Range (min … max):    62.2 ms … 100.6 ms    150 runs
 
Benchmark 1: git read-tree -n cd4d9d934fd3bc1b6a0b0fcb3548a1b26fc53c9d
  Time (mean ± σ):      65.1 ms ±   2.6 ms    [User: 57.3 ms, System: 5.7 ms]
  Range (min … max):    62.7 ms …  81.1 ms    150 runs
 
Benchmark 1: git read-tree -n cd4d9d9
  Time (mean ± σ):      64.6 ms ±   2.4 ms    [User: 57.2 ms, System: 5.6 ms]
  Range (min … max):    62.4 ms …  78.2 ms    150 runs
 
Benchmark 1: git read-tree 28b9a7976fd3103a1a2b087a1fac01e3519db8d4 --index-output=../test.index
  Time (mean ± σ):      83.6 ms ±   4.4 ms    [User: 67.2 ms, System: 6.9 ms]
  Range (min … max):    73.8 ms … 103.0 ms    150 runs
 

----------------------------------------------------------------

Running gix benchmark on repository: "linux"

Benchmark 1: gix index from-tree c56e40167a7531d26d19433566533b7510a6f9ba
  Time (mean ± σ):      2.070 s ±  0.031 s    [User: 2.042 s, System: 0.023 s]
  Range (min … max):    2.035 s …  2.238 s    150 runs
 
Benchmark 1: gix index from-tree c56e401
  Time (mean ± σ):      2.059 s ±  0.016 s    [User: 2.036 s, System: 0.023 s]
  Range (min … max):    2.036 s …  2.145 s    150 runs
 
Benchmark 1: gix index from-tree a1375562c0a87f0fa2eaf3e8ce15824696d4170a
  Time (mean ± σ):      2.061 s ±  0.026 s    [User: 2.037 s, System: 0.022 s]
  Range (min … max):    2.035 s …  2.211 s    150 runs
 
Benchmark 1: gix index from-tree a137556
  Time (mean ± σ):      2.054 s ±  0.013 s    [User: 2.033 s, System: 0.022 s]
  Range (min … max):    2.036 s …  2.120 s    150 runs
 
Benchmark 1: gix index from-tree c56e40167a7531d26d19433566533b7510a6f9ba -i ../test.index
  Time (mean ± σ):      3.779 s ±  0.032 s    [User: 2.129 s, System: 1.634 s]
  Range (min … max):    3.707 s …  3.968 s    150 runs
 

----------------------------------------------------------------

Running git benchmark on repository: "linux"

Benchmark 1: git read-tree -n c56e40167a7531d26d19433566533b7510a6f9ba
  Time (mean ± σ):     141.8 ms ±   1.6 ms    [User: 124.2 ms, System: 15.5 ms]
  Range (min … max):   139.6 ms … 148.8 ms    150 runs
 
Benchmark 1: git read-tree -n c56e401
  Time (mean ± σ):     143.7 ms ±   6.0 ms    [User: 125.4 ms, System: 15.8 ms]
  Range (min … max):   139.8 ms … 188.8 ms    150 runs
 
Benchmark 1: git read-tree -n a1375562c0a87f0fa2eaf3e8ce15824696d4170a
  Time (mean ± σ):     142.9 ms ±   3.6 ms    [User: 124.8 ms, System: 15.6 ms]
  Range (min … max):   139.6 ms … 163.5 ms    150 runs
 
Benchmark 1: git read-tree -n a137556
  Time (mean ± σ):     143.5 ms ±   2.8 ms    [User: 125.5 ms, System: 15.8 ms]
  Range (min … max):   139.5 ms … 157.1 ms    150 runs
 
Benchmark 1: git read-tree c56e40167a7531d26d19433566533b7510a6f9ba --index-output=../test.index
  Time (mean ± σ):     181.8 ms ±  13.0 ms    [User: 152.1 ms, System: 17.6 ms]
  Range (min … max):   170.2 ms … 254.4 ms    150 runs
 

----------------------------------------------------------------


SidneyDouw avatar Sep 27 '22 10:09 SidneyDouw

Thanks for the benchmarks! I am a big fan of those! When looking at the performance difference I can only only hope that gix was compiled in debug mode. But we don't get off that easily, right? The file-output performance is probably lacking because it's missing a BufWriter.

You can experiment with caches and see if these improve performance. gix responds to these environment variables which makes it easy to play around with.

Last but not least, Instruments time-profiler is the tool of choice to see where time is spent. My feeling is the tree-traversal is terribly slow, even though I never experienced it to be that bad, so I'd hope there is other issues causing this. I am looking at Linux in particular when saying this, as 'bad' gitoxide performance is usually in the 30% slower than git range, and not 1500% slower 😅.

Byron avatar Sep 27 '22 10:09 Byron

Actually, maybe it's exponential behavior in one of the algorithms, since it gets so much worse the bigger the tree. Wasn't there as retain() somewhere which I thought 'should be fine', but now it isn't?

Byron avatar Sep 27 '22 11:09 Byron

Git

. No File Output With File Output
Gix 5.3 ms 5.3 ms
Git 13.4 ms 13.8 ms

Rust

. No File Output With File Output
Gix 32.4 ms 32.3 ms
Git 67.6 ms 80.2 ms

Linux

. No File Output With File Output
Gix 60.1 ms 62.0 ms
Git 149.5 ms 184.3 ms

Benchmark Script

#!/bin/bash

gix_benchmark() {
    echo "Running gix benchmark on repository: \"$1\""
    echo
    (
        cd "$1" || return

        hyperfine --warmup 10 --runs "$2" "gix index from-tree '@^{tree}'"
        hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "gix index from-tree '@^{tree}' -i ../test.index"
    )
    echo
    echo "----------------------------------------------------------------"
    echo
}

git_benchmark() {
    echo "Running git benchmark on repository: \"$1\""
    echo
    (
        cd "$1" || return

        hyperfine --warmup 10 --runs "$2" "git read-tree -n '@^{tree}'"
        hyperfine --warmup 10 --runs "$2" --prepare "rm -f ../test.index" "git read-tree '@^{tree}' --index-output=../test.index"
    )
    echo
    echo "----------------------------------------------------------------"
    echo
}

gix_benchmark git 150
git_benchmark git 150

gix_benchmark rust 100
git_benchmark rust 100

gix_benchmark linux 50
git_benchmark linux 50


Output

Running gix benchmark on repository: "git"

Benchmark 1: gix index from-tree '@^{tree}'
  Time (mean ± σ):       5.3 ms ±   0.4 ms    [User: 3.9 ms, System: 2.9 ms]
  Range (min … max):     4.8 ms …   7.3 ms    150 runs
 
Benchmark 1: gix index from-tree '@^{tree}' -i ../test.index
  Time (mean ± σ):       5.3 ms ±   0.8 ms    [User: 3.7 ms, System: 3.2 ms]
  Range (min … max):     4.7 ms …  11.0 ms    150 runs
 

----------------------------------------------------------------

Running git benchmark on repository: "git"

Benchmark 1: git read-tree -n '@^{tree}'
  Time (mean ± σ):      13.4 ms ±   2.7 ms    [User: 9.1 ms, System: 2.8 ms]
  Range (min … max):    11.6 ms …  35.7 ms    150 runs
 
Benchmark 1: git read-tree '@^{tree}' --index-output=../test.index
  Time (mean ± σ):      13.8 ms ±   0.8 ms    [User: 9.8 ms, System: 2.8 ms]
  Range (min … max):    12.7 ms …  19.0 ms    150 runs
 

----------------------------------------------------------------

Running gix benchmark on repository: "rust"

Benchmark 1: gix index from-tree '@^{tree}'
  Time (mean ± σ):      32.4 ms ±   1.9 ms    [User: 25.7 ms, System: 8.0 ms]
  Range (min … max):    30.7 ms …  46.0 ms    100 runs
 
Benchmark 1: gix index from-tree '@^{tree}' -i ../test.index
  Time (mean ± σ):      32.3 ms ±   3.6 ms    [User: 21.0 ms, System: 10.3 ms]
  Range (min … max):    28.7 ms …  60.0 ms    100 runs
 

----------------------------------------------------------------

Running git benchmark on repository: "rust"

Benchmark 1: git read-tree -n '@^{tree}'
  Time (mean ± σ):      67.6 ms ±   4.9 ms    [User: 59.3 ms, System: 6.3 ms]
  Range (min … max):    64.1 ms …  95.1 ms    100 runs
 
Benchmark 1: git read-tree '@^{tree}' --index-output=../test.index
  Time (mean ± σ):      80.2 ms ±   5.3 ms    [User: 69.9 ms, System: 7.3 ms]
  Range (min … max):    75.9 ms … 101.9 ms    100 runs
 

----------------------------------------------------------------

Running gix benchmark on repository: "linux"

Benchmark 1: gix index from-tree '@^{tree}'
  Time (mean ± σ):      60.1 ms ±   2.7 ms    [User: 45.3 ms, System: 15.5 ms]
  Range (min … max):    58.5 ms …  72.0 ms    50 runs
 
Benchmark 1: gix index from-tree '@^{tree}' -i ../test.index
  Time (mean ± σ):      62.0 ms ±   6.1 ms    [User: 36.9 ms, System: 20.2 ms]
  Range (min … max):    54.7 ms …  87.0 ms    50 runs
 

----------------------------------------------------------------

Running git benchmark on repository: "linux"

Benchmark 1: git read-tree -n '@^{tree}'
  Time (mean ± σ):     149.5 ms ±   9.7 ms    [User: 129.8 ms, System: 16.8 ms]
  Range (min … max):   141.8 ms … 185.8 ms    50 runs
 
Benchmark 1: git read-tree '@^{tree}' --index-output=../test.index
  Time (mean ± σ):     184.3 ms ±  13.7 ms    [User: 158.5 ms, System: 19.2 ms]
  Range (min … max):   172.8 ms … 248.3 ms    50 runs
 

----------------------------------------------------------------


SidneyDouw avatar Sep 29 '22 16:09 SidneyDouw

That's awesome! This tells us that writing the index costs nearly nothing if buffered, not entirely unsurprising, but surprising how fast it actually is. And whatever git is doing there costs it dearly, and as far as I know it only creates the TREE extension on top of what we are doing. And that data, you already have so it's strange it costs so much.

Maybe it's just that much more efficient, maybe they do .insert() into a vector after all which just happens to be way faster than what we have been doing.

Do you think it's ready for review? If so, you can signal me by changing the PR state accordingly.

Byron avatar Sep 29 '22 23:09 Byron

Thanks a lot!

I made a few adjustments before merging in case you want to have a look.

Byron avatar Oct 12 '22 00:10 Byron

It turns out the indices written weren't valid due to missing trailer, the missing tests and API changes will be coming in via #556 .

Byron avatar Oct 12 '22 02:10 Byron

git is way slower in traversing trees than gitoxide seemingly is, but gains points when writing the index back. Didn't check, but maybe it's created fully in memory whereas we do a couple more write calls thanks to small buffer sizes?

Screen Shot 2022-10-12 at 16 25 57 Screen Shot 2022-10-12 at 16 30 20

Apparently git keeps additional caches when traversing trees and that costs even though the caches aren't useful here. It's similar if we turn on an object cache, which reduces performance measurably as there is no cache hit (just costs of maintaining the cache). Nice to have that level of control and turn caches on only when they are always beneficial.

Byron avatar Oct 12 '22 08:10 Byron