foundationdb
foundationdb copied to clipboard
FDB cluster with three_datacenter mode becomes unavailable if one of three DCs has network card failure
I have a FDB cluster in three_datacenter mode. Each DC has 6 machines.
I found that the cluster becomes unavailable if I randomly down the machines’ public network interface (ifdown bond1) in the primary DC (fdbcli --exec ‘status json’ | jq .cluster.active_primary_dc).
The problem is hard to recur.
Any advice to address this?
Could you try using three datahall mode, where each data hall is bound to a data center? We know three datahall can survive this type of failure mode. FDB is not resilient against asymmetric network, but it seems that's not the case in your tests.