rDSN
rDSN copied to clipboard
potential secondary learn too slow to catch up with primary
In our test, we found that the potential secondary learn too slow:
log.201.txt:D12:14:41.839 (1470744881839169361 bc6c) replica5.replica_long3.080400030000001d: replica.learn:843:on_copy_remote_state_completed(): [email protected]:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 3754 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (894646 => 897430), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897431, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.201.txt:D12:14:44.532 (1470744884532517850 bc6a) replica5.replica_long1.0804000300000372: replica.learn:843:on_copy_remote_state_completed(): [email protected]:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 6448 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (897430 => 897690), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897691, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.201.txt:D12:14:47.068 (1470744887068260243 bc69) replica5.replica_long0.080400030000057d: replica.learn:843:on_copy_remote_state_completed(): [email protected]:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 8983 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (897690 => 897820), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897821, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
...
log.211.txt:D12:38:55.960 (1470746335960825825 bc69) replica5.replica_long0.08040003000571ba: replica.learn:843:on_copy_remote_state_completed(): [email protected]:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1457876 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991185 => 991475), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 991476, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.211.txt:D12:39:00.286 (1470746340286794584 bc69) replica5.replica_long0.080400030005764f: replica.learn:843:on_copy_remote_state_completed(): [email protected]:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1462202 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991475 => 991780), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 991781, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.211.txt:D12:39:00.377 (1470746340377718938 bc69) replica5.replica_long0.0804000300057af4: replica.learn:843:on_copy_remote_state_completed(): [email protected]:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1462293 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991780 => 992088), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 992089, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
As you can see, the learning process lasts for tens of minutes.
That is:
- on each learning round, PotentialSecondary copies a private log from Primary and applies it, which learns hundreds of mutations and takes 2~3 seconds; but during the the time, Primary commits another hundreds of mutations, and the private log grows.
- then, PotentialSecondary need to start another round of learning
The reason is:
- on each round, PotentialSecondary need to copy the whole latest private log file, only to learn the last several mutations
- the private log will be switched (create new private log) only if its size exceed a limit (which is 32MB in our config)
- so we copy lots of unnecessary data on each learning round
Improvement suggestion:
- if Primary detects that the private log is in learning, it can switch the private log immediately to reduce copying size of private log.
@imzhenyu , do you have better improvements?
There is a configuration for the mutation cache size. You may enlarge the cache size to avoid learning the private log as well in many cases.
Another way is to use RPC + AIO to pipeline log read + private log apply to avoid unnecessary log written on potential secondary's disk.