raft-rs
raft-rs copied to clipboard
[WIP] Add cluster benchmark
Part of https://github.com/tikv/raft-rs/issues/109
This PR tries to add a benchmark for the real raft node cluster. The communication between nodes is supported by mspc::channel which brings some overhead.
@Hoverbear @hicqu
Problems
- [x] ~Criterion complains about that benchmark time will be too long. Consider using raw benching.~ Official
#[bench]is an unstable feature. - [ ] Reduce the overhead in benchmark
Signed-off-by: Fullstop000 [email protected]
Sample output:
PS C:\Users\Hoverbear\Git\raft-rs> cargo bench Raft::cluster
Compiling raft v0.6.0-alpha (C:\Users\Hoverbear\Git\raft-rs)
Finished release [optimized] target(s) in 11.61s
Running target\release\deps\raft-18f353cfeaa7c747.exe
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 50 filtered out
Running target\release\deps\benches-fd626a4f421dffb5.exe
Gnuplot not found, disabling plotting
Benchmarking Raft::cluster/1: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 56.1s or reduce sample count to 10
Raft::cluster/1 time: [32.257 ms 33.791 ms 35.351 ms]
thrpt: [28.288 B/s 29.594 B/s 31.001 B/s]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low severe
1 (10.00%) high mild
Benchmarking Raft::cluster/32: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 57.9s or reduce sample count to 10
Raft::cluster/32 time: [30.280 ms 31.254 ms 32.484 ms]
thrpt: [985.09 B/s 1023.9 B/s 1.0320 KiB/s]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low severe
1 (10.00%) high severe
Benchmarking Raft::cluster/128: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 56.6s or reduce sample count to 10
Raft::cluster/128 time: [30.699 ms 32.509 ms 34.393 ms]
thrpt: [3.6344 KiB/s 3.8451 KiB/s 4.0718 KiB/s]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low mild
1 (10.00%) high severe
Benchmarking Raft::cluster/512: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 58.3s or reduce sample count to 10
Raft::cluster/512 time: [30.405 ms 32.548 ms 34.221 ms]
thrpt: [14.611 KiB/s 15.362 KiB/s 16.445 KiB/s]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low severe
1 (10.00%) low mild
Benchmarking Raft::cluster/1024: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 56.0s or reduce sample count to 10
Raft::cluster/1024 time: [28.667 ms 29.879 ms 31.049 ms]
thrpt: [32.207 KiB/s 33.468 KiB/s 34.883 KiB/s]
Benchmarking Raft::cluster/4096: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 56.3s or reduce sample count to 10
Raft::cluster/4096 time: [29.035 ms 30.146 ms 31.429 ms]
thrpt: [127.27 KiB/s 132.69 KiB/s 137.76 KiB/s]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Benchmarking Raft::cluster/32768: Warming up for 500.00 ms
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 56.7s or reduce sample count to 10
Raft::cluster/32768 time: [28.394 ms 29.335 ms 30.127 ms]
thrpt: [1.0373 MiB/s 1.0653 MiB/s 1.1006 MiB/s]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Criterion complains about that benchmark time will be too long. Consider using raw benching.
Is this the error?
Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 56.6s or reduce sample count to 10
Yes, it's the problem. I limit both the sample count and measurement time to make the benchmark running time in an acceptable range temporarily. But the benching result seems not very desirable. Maybe the implementation brings too many overheads by channels and mutexes. Do you have any suggestions :)? @Hoverbear
Yeah, I think at this point we're mostly benchmarking channels and overheads of the benchmark itself. :(
That is kind of the realistic case though, most raft clusters are running over networks and this have much more network overhead than CPU time.
I wonder if it might be more valuable to try to capture the time it takes for a node (the leader and probably separately a follower) to process a received message from a client or other node, and then respond.
This way we could perhaps avoid measuring the channel/mutex times?
I start to believe that the channel-based cluster is hard to be benched by Criterion.rs because it can't handle long-running benchmarks very well (described in the referenced issue) for now after I tried many different approaches based on it :(.
So I re-consider the problem: the target is to measure the speed of committing logs which means we need to calculate the duration between the proposing and consuming the corresponding committed entry per proposal. Therefore, maybe we don't need an async communication style cluster since we can simulate things described above just in the leader node. The total process from a proposal to an entry committing in the leader:
stepa proposal- Send
MsgAppends and getMsgAppendResps. I think this step can be simplified as a dynamic time costs according to the msg size but it might be hard to define the rule. stepallMsgAppendResp- Consume a
Ready
And now we can have a total sync-styled process in the leader node and measure the duration easily without any overheads.
What do you think about this idea? @Hoverbear @hicqu
@Fullstop000 So you'd mock the non-leader actions?
@Hoverbear Yes, though I still want a benchmark over the channel-based cluster (maybe create a specialized harness for this?).
And I realize that by the no overhead approach the whole routine is composed of several function calls and an RTT: Leader step(proposal) -> Send entry -> Follower step(MsgAppend) -> Send resp(nearly static time) -> Leader step(MsgAppendResp) -> Leader consume a Ready. The RTT part can be mocked well and it seems we can just bench the rest steps individually.
Seems to make sense to me :)