ytsaurus icon indicating copy to clipboard operation
ytsaurus copied to clipboard

Suboptimal default timeout in FollowerPingRpc causing master quorum breaks

Open kaikash opened this issue 9 months ago • 1 comments

Hello!

I'm encountering an issue where the default value for FollowerPingRpcTimeout, which is currently set to 1 second, appears to be insufficient. This setting seems problematic because master followers frequently experience delays during the fork call. Such delays lead to the followers freezing for extended periods, resulting in Error pinging follower. This error, in turn, disrupts the master quorum. Moreover, it might worth to change other election manager defaults as well?

kaikash avatar May 08 '24 15:05 kaikash

The same for leader_lease_timeout and leader_lease_grace_delay in hydra_manager.

kaikash avatar May 11 '24 16:05 kaikash