lakeFS
lakeFS copied to clipboard
KV Graveler Benchmarks
Currently, there are no benchmarks for graveler. This can make it difficult to identify perfromance regression due to the move to KV.
Need to implement benchmarks and run on both SQL and KV.
The benchmarks should include at least a represneting set of operations for pkg/graveler/ref
and pkg/graveler/staging
, as these are the DB accessing packages
DoD A representing benchmark that runs on SQL and KV and a conclusion regarding the performance quality of graveler over KV
Note: A decision is to be made regarding the required benchmarks - which scenarios do we really want to test. One example is multiple parallel commits, that will definitely fail each other until all retries will succeed - while this is an exhaustive scenario, which stress the performance to an edge, it might not be as interesting as a "real life" scenario. The bottom line is we can predict that such a scenario will show performance degradation for our not-lock-free commit algoritm, but it is probably not interesting. Another example is a single long commit, with multiple staging operations taking place at the same time. This scenario is much more common in real life usage and a performance regression in this scenario is something we better discover sooner than later.
https://github.com/treeverse/lakeFS/pull/3669#discussion_r924021857 Add benchmark to BranchByCommitIterator
Test plan Perform the following operations on a lakeFS running locally with postgres container. Compare KV and DB results for:
- Write 100K records.
- Read 100K records.
- Commit 100K records.
- Listing 100K records.
- Commit every 5 seconds while writing to the same branch.
Results when running locally with postgres comparing DB & KV are below. They contain a quick fix for #3888. Bottom line: KV benchmarks are equal or better than the DB ones in all tests performed.
- Write 20K records.
-
DB: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 390 350 407 500 407 750 409 1000 18325 5000 20025 min 168 max 2609 total 20025
-
KV: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 186 75 3983 100 16533 250 19873 350 19900 500 19900 750 19900 1000 19900 5000 20000 min 36 max 1119 total 20000
- Read 100K records from uncommitted:
-
DB: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 70 75 1389 100 12810 250 108855 350 109231 500 109232 750 109232 1000 109232 5000 109232 min 30 max 373 total 109232
-
KV: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 154 75 4014 100 34105 250 109584 350 109646 500 109646 750 109646 1000 109646 5000 109846 min 27 max 1185 total 109846
- Commit records:
-
DB (50K): ./lakectl commit lakefs://repo1/test -m "test" 0.03s user 0.02s system 0% cpu 7.586 total
-
KV (100k): ./lakectl commit lakefs://repo1/main -m "test" 0.03s user 0.02s system 1% cpu 2.733 total
- Listing records from uncommitted:
-
DB (50K): ./lakectl fs ls --recursive lakefs://repo1/test/ 0.97s user 0.37s system 39% cpu 3.401 total
-
KV (100K): ./lakectl fs ls --recursive lakefs://repo1/main/ 1.89s user 0.74s system 40% cpu 6.533 total
- Commit every 5 seconds while writing to the same branch.
-
DB:
-
Commit: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 0 350 0 500 0 750 0 1000 0 5000 15 min 1653 max 3633 total 15
-
Write: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 119 350 123 500 148 750 173 1000 3449 5000 5159 min 174 max 4395 total 5159
-
KV:
-
Commits: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 0 350 0 500 0 750 0 1000 0 5000 11 min 1472 max 10193 total 15
-
Writes: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 66 75 2254 100 13472 250 67363 350 68064 500 68085 750 68085 1000 68154 5000 68285 min 37 max 1146 total 68285
lakeFS with DynamoDB
Write 20k
$ lakectl abuse random-write --amount 20000 --prefix more-abuse/ lakefs://load-test/main
completed: 20000, errors: 0, current rate: 979.07 done/second
Histogram (ms):
1 0
2 0
5 0
7 0
10 0
15 0
25 0
50 0
75 9
100 90
250 16814
350 19769
500 19968
750 20000
1000 20000
5000 20000
min 64
max 547
total 20000
$ lakectl abuse random-write --prefix more-abuse/ --amount 100000 lakefs://load-test/main
completed: 100000, errors: 0, current rate: 557.01 done/second
Histogram (ms):
1 0
2 0
5 0
7 0
10 0
15 0
25 0
50 0
75 7
100 257
250 83540
350 99268
500 99940
750 99997
1000 99999
5000 100000
min 60
max 1285
total 100000
Read 100K records from uncommitted
File with keys generated by listing 20k keys from the previous test into a file.
$ lakectl abuse random-read --amount 100000 --from-file keys lakefs://load-test/main
completed: 100000, errors: 0, current rate: 424.47 done/second
Histogram (ms):
1 0
2 0
5 0
7 0
10 0
15 0
25 0
50 198
75 2281
100 6403
250 94796
350 99640
500 99903
750 99912
1000 99964
5000 100000
min 26
max 1372
total 100000
Commit
Write 100k and mesure commit
$ lakectl abuse random-write --amount 100000 lakefs://load-test/main
completed: 100000, errors: 44160, current rate: 4042.46 done/second
Histogram (ms):
1 0
2 0
5 0
7 0
10 0
15 0
25 0
50 1
75 411
100 3957
250 48425
350 55182
500 55644
750 55741
1000 55747
5000 55770
min 50
max 23801
total 55840
$ time lakectl commit lakefs://load-test/main -m "more abuse"
Branch: lakefs://load-test/main
Commit for branch "main" completed.
ID: 97eb2f4b661e65a3267aa9a547d9cf224a2995f6a2c83767e7971b332c668be0
Message: more abuse
Timestamp: 2022-08-16 06:51:01 +0000 UTC
Parents: c8e2339af89f700669104618a36140e5f7d3ebcee9b9cd7ada30769e5c798de5
real 0m4.204s
user 0m0.033s
sys 0m0.018s
# Commit additional 100k new objects
$ time lakectl commit lakefs://load-test/main -m "test3"
Branch: lakefs://load-test/main
Commit for branch "main" completed.
ID: 7ca0b1e54e098b302db81c9b779a221c83fddfd29c44a245b82794da5b7664b3
Message: test3
Timestamp: 2022-08-16 09:45:41 +0000 UTC
Parents: 97eb2f4b661e65a3267aa9a547d9cf224a2995f6a2c83767e7971b332c668be0
real 0m3.968s
user 0m0.015s
sys 0m0.030s
Listing
$ lakectl abuse random-write --amount 100000 --prefix abuse-for-list/ lakefs://load-test/main
$ time lakectl fs ls lakefs://load-test/main/abuse-for-list/ | wc -l
100000
real 0m29.682s
user 0m3.663s
sys 0m3.365s
Commit every 5 seconds while writing to the same branch
# in the background
# lakectl abuse random-write --prefix abuse-write-1/ lakefs://load-test/main
$ lakectl abuse commit --gap 5s lakefs://load-test/main
completed: 100, errors: 5, current rate: 0.00 done/second
Histogram (ms):
1 0
2 0
5 0
7 0
10 0
15 0
25 0
50 0
75 0
100 0
250 0
350 0
500 0
750 0
1000 5
5000 95
min 815
max 1992
total 95
NOTE that 5 requests to commit failed - at least 3 on timeout