featurebase
featurebase copied to clipboard
NodeLeave gossip event reported when cluster under heavy load
Expected behavior
only report node leave when either node crashes or signals a departure from the cluster
Actual behavior
when cluster performing a long running operation > hours. gossip is determining that a node has died du to lack of response to probes.
Information about your environment (OS/architecture, CPU, RAM, cluster/solo, configuration, etc.)
3 node oci cluster 930 GB ram 144 cores 72 TB nvme storage
Possible that increasing gossip.suspicion-mult
will improve this behavior. Need to test.