foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

FDB cluster with three_datacenter mode becomes unavailable if one of three DCs has network card failure

Open Rjerk opened this issue 3 years ago • 1 comments

I have a FDB cluster in three_datacenter mode. Each DC has 6 machines.

I found that the cluster becomes unavailable if I randomly down the machines’ public network interface (ifdown bond1) in the primary DC (fdbcli --exec ‘status json’ | jq .cluster.active_primary_dc).

The problem is hard to recur.

Any advice to address this?

Rjerk avatar Dec 01 '22 09:12 Rjerk

Could you try using three datahall mode, where each data hall is bound to a data center? We know three datahall can survive this type of failure mode. FDB is not resilient against asymmetric network, but it seems that's not the case in your tests.

jzhou77 avatar Dec 05 '22 17:12 jzhou77