restate icon indicating copy to clipboard operation
restate copied to clipboard

Investigate potential lock contention in DBImpl::WriteImpl when writing to the PartitionStore

Open tillrohrmann opened this issue 1 year ago • 1 comments

While benchmarking Restate, I noticed that we spend a lot of time in rocksd::DBImpl::WriteImpl when trying to commit the PartitionStoreTransaction from the different partition processors. I suspect that this might be cause by lock contention. Unfortunately, the flamegraphs on MacOS don't give more insights.

The results of throughput/parallel with main 361e6a8055965ed94b4cd8810642d846aa25f7df were:

throughput/parallel     time:   [397.84 ms 412.47 ms 426.13 ms]
                        thrpt:  [9.3868 Kelem/s 9.6976 Kelem/s 10.054 Kelem/s]

flamegraph

tillrohrmann avatar Aug 26 '24 10:08 tillrohrmann

I've tried a simple experiment where every PartitionStore gets its own RocksDB instance to avoid contention completely. The results of the throughput/parallel benchmark are:

throughput/parallel     time:   [354.54 ms 359.25 ms 364.08 ms]
                        thrpt:  [10.986 Kelem/s 11.134 Kelem/s 11.282 Kelem/s]

and the flamegraph no longer shows time spent on awaiting the lock when writing to the PartitionStore (DBImpl::WriteImpl):

flamegraph

tillrohrmann avatar Aug 27 '24 13:08 tillrohrmann