lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

KV Graveler Benchmarks

Open itaidavid opened this issue 2 years ago • 6 comments

Currently, there are no benchmarks for graveler. This can make it difficult to identify perfromance regression due to the move to KV. Need to implement benchmarks and run on both SQL and KV. The benchmarks should include at least a represneting set of operations for pkg/graveler/ref and pkg/graveler/staging, as these are the DB accessing packages

DoD A representing benchmark that runs on SQL and KV and a conclusion regarding the performance quality of graveler over KV

itaidavid avatar Jun 25 '22 03:06 itaidavid

Note: A decision is to be made regarding the required benchmarks - which scenarios do we really want to test. One example is multiple parallel commits, that will definitely fail each other until all retries will succeed - while this is an exhaustive scenario, which stress the performance to an edge, it might not be as interesting as a "real life" scenario. The bottom line is we can predict that such a scenario will show performance degradation for our not-lock-free commit algoritm, but it is probably not interesting. Another example is a single long commit, with multiple staging operations taking place at the same time. This scenario is much more common in real life usage and a performance regression in this scenario is something we better discover sooner than later.

itaidavid avatar Jun 29 '22 05:06 itaidavid

https://github.com/treeverse/lakeFS/pull/3669#discussion_r924021857 Add benchmark to BranchByCommitIterator

N-o-Z avatar Jul 19 '22 14:07 N-o-Z

Test plan Perform the following operations on a lakeFS running locally with postgres container. Compare KV and DB results for:

  1. Write 100K records.
  2. Read 100K records.
  3. Commit 100K records.
  4. Listing 100K records.
  5. Commit every 5 seconds while writing to the same branch.

itaiad200 avatar Aug 04 '22 12:08 itaiad200

Results when running locally with postgres comparing DB & KV are below. They contain a quick fix for #3888. Bottom line: KV benchmarks are equal or better than the DB ones in all tests performed.

  1. Write 20K records.
  • DB: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 390 350 407 500 407 750 409 1000 18325 5000 20025 min 168 max 2609 total 20025

  • KV: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 186 75 3983 100 16533 250 19873 350 19900 500 19900 750 19900 1000 19900 5000 20000 min 36 max 1119 total 20000

  1. Read 100K records from uncommitted:
  • DB: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 70 75 1389 100 12810 250 108855 350 109231 500 109232 750 109232 1000 109232 5000 109232 min 30 max 373 total 109232

  • KV: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 154 75 4014 100 34105 250 109584 350 109646 500 109646 750 109646 1000 109646 5000 109846 min 27 max 1185 total 109846

  1. Commit records:
  • DB (50K): ./lakectl commit lakefs://repo1/test -m "test" 0.03s user 0.02s system 0% cpu 7.586 total

  • KV (100k): ./lakectl commit lakefs://repo1/main -m "test" 0.03s user 0.02s system 1% cpu 2.733 total

  1. Listing records from uncommitted:
  • DB (50K): ./lakectl fs ls --recursive lakefs://repo1/test/ 0.97s user 0.37s system 39% cpu 3.401 total

  • KV (100K): ./lakectl fs ls --recursive lakefs://repo1/main/ 1.89s user 0.74s system 40% cpu 6.533 total

  1. Commit every 5 seconds while writing to the same branch.
  • DB:

  • Commit: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 0 350 0 500 0 750 0 1000 0 5000 15 min 1653 max 3633 total 15

  • Write: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 119 350 123 500 148 750 173 1000 3449 5000 5159 min 174 max 4395 total 5159

  • KV:

  • Commits: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 0 100 0 250 0 350 0 500 0 750 0 1000 0 5000 11 min 1472 max 10193 total 15

  • Writes: Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 66 75 2254 100 13472 250 67363 350 68064 500 68085 750 68085 1000 68154 5000 68285 min 37 max 1146 total 68285

itaiad200 avatar Aug 14 '22 06:08 itaiad200

lakeFS with DynamoDB

Write 20k

$ lakectl abuse random-write --amount 20000 --prefix more-abuse/ lakefs://load-test/main

completed: 20000, errors: 0, current rate: 979.07 done/second

Histogram (ms):
1	0
2	0
5	0
7	0
10	0
15	0
25	0
50	0
75	9
100	90
250	16814
350	19769
500	19968
750	20000
1000	20000
5000	20000
min	64
max	547
total	20000
$ lakectl abuse  random-write --prefix more-abuse/ --amount 100000 lakefs://load-test/main

completed: 100000, errors: 0, current rate: 557.01 done/second

Histogram (ms):
1	0
2	0
5	0
7	0
10	0
15	0
25	0
50	0
75	7
100	257
250	83540
350	99268
500	99940
750	99997
1000	99999
5000	100000
min	60
max	1285
total	100000

Read 100K records from uncommitted

File with keys generated by listing 20k keys from the previous test into a file.

$ lakectl abuse random-read --amount 100000 --from-file keys lakefs://load-test/main

completed: 100000, errors: 0, current rate: 424.47 done/second

Histogram (ms):
1	0
2	0
5	0
7	0
10	0
15	0
25	0
50	198
75	2281
100	6403
250	94796
350	99640
500	99903
750	99912
1000	99964
5000	100000
min	26
max	1372
total	100000

Commit

Write 100k and mesure commit

$ lakectl abuse random-write --amount 100000 lakefs://load-test/main

completed: 100000, errors: 44160, current rate: 4042.46 done/second

Histogram (ms):
1	0
2	0
5	0
7	0
10	0
15	0
25	0
50	1
75	411
100	3957
250	48425
350	55182
500	55644
750	55741
1000	55747
5000	55770
min	50
max	23801
total	55840

$ time lakectl commit lakefs://load-test/main -m "more abuse"
Branch: lakefs://load-test/main
Commit for branch "main" completed.

ID: 97eb2f4b661e65a3267aa9a547d9cf224a2995f6a2c83767e7971b332c668be0
Message: more abuse
Timestamp: 2022-08-16 06:51:01 +0000 UTC
Parents: c8e2339af89f700669104618a36140e5f7d3ebcee9b9cd7ada30769e5c798de5


real    0m4.204s
user    0m0.033s
sys	    0m0.018s
# Commit additional 100k new objects
$ time lakectl commit lakefs://load-test/main -m "test3"
Branch: lakefs://load-test/main
Commit for branch "main" completed.

ID: 7ca0b1e54e098b302db81c9b779a221c83fddfd29c44a245b82794da5b7664b3
Message: test3
Timestamp: 2022-08-16 09:45:41 +0000 UTC
Parents: 97eb2f4b661e65a3267aa9a547d9cf224a2995f6a2c83767e7971b332c668be0


real    0m3.968s
user    0m0.015s
sys     0m0.030s

Listing

$ lakectl abuse random-write --amount 100000 --prefix abuse-for-list/ lakefs://load-test/main

$ time lakectl fs ls lakefs://load-test/main/abuse-for-list/ | wc -l 

100000

real    0m29.682s
user    0m3.663s
sys     0m3.365s

Commit every 5 seconds while writing to the same branch

# in the background
# lakectl abuse random-write --prefix abuse-write-1/ lakefs://load-test/main

$ lakectl abuse commit --gap 5s lakefs://load-test/main
completed: 100, errors: 5, current rate: 0.00 done/second

Histogram (ms):
1	0
2	0
5	0
7	0
10	0
15	0
25	0
50	0
75	0
100	0
250	0
350	0
500	0
750	0
1000	5
5000	95
min	815
max	1992
total	95

NOTE that 5 requests to commit failed - at least 3 on timeout

nopcoder avatar Aug 16 '22 11:08 nopcoder