databend retry backoff for meta replication

retry backoff for meta replication

Open everpcpc opened this issue 2 years ago • 5 comments

Summary

When some node in cluster restart or in wrong state, log file would be filled with same high-frequency error messages.

Sep 19 '22 04:09 everpcpc

I self recommend the crate maintained by myself: https://github.com/Xuanwo/backon

Sep 19 '22 09:09 Xuanwo

This is in high priority, also cc @lichuang to have a look before @ariesdevil work from fuse engine external location back.

Sep 20 '22 08:09 BohuTANG

I self recommend the crate maintained by myself: https://github.com/Xuanwo/backon

Looks great

Sep 20 '22 13:09 drmingdrmer

Since @lichuang & @ariesdevil is busy on other projects, can @ClSlaid take a look on this issue? This issue is marked as prio: high.

Sep 21 '22 17:09 Xuanwo

Since @lichuang & @ariesdevil is busy on other projects, can @ClSlaid take a look on this issue? This issue is marked as prio: high.

It'd be nice if @ClSlaid can help on this:

It can be done by wrapping three of the raft-network APIs with some backoff loop. send_append_entries(), send_vote() and send_install_snapshot():

https://github.com/datafuselabs/databend/blob/9e4d7da64f831ab863585c3152af58c905e70041/src/meta/service/src/network.rs#L115

Sep 22 '22 02:09 drmingdrmer

databend databend copied to clipboard

retry backoff for meta replication

databend
databend copied to clipboard