fdb-kubernetes-operator icon indicating copy to clipboard operation
fdb-kubernetes-operator copied to clipboard

Support three_data_hall redundancy

Open simenl opened this issue 4 years ago • 9 comments

Many cloud providers have Availability Zones within their regions. These zones are independent from each other, such that failures affecting multiple zones are less likely.

We'd like to run FoundationDB with awareness of these zones, so that FoundationDB is available and suffers no data-loss during a full outage in a single availability zone. The three_data_hall redundancy seems like a great fit, using the availability zone as locality_data_hall in each process.

simenl avatar Oct 20 '20 12:10 simenl

Idea: Add an option to provide a list of three availability zones (or 'data halls') in the cluster configuration. And also a corresponding label key. The availability zone is often provided as a label on the kubernetes node, as topology.kubernetes.io/zone.

availability_zone_key: 'topology.kubernetes.io/zone'
availability_zones: ['europe-west1-b', 'europe-west1-c', 'europe-west1-d']

The kubernetes operator would use a nodeSelector to decide which availability zone new pods would be placed on. E.g:

spec:
  nodeSelector:
    topology.kubernetes.io/zone: europe-west1-b

The availability zone would be passed down to the pod, so that the fdb process can set the correct locality_data_hall.

The kubernetes operator would need to make sure that the number of pods in each availability zone is balanced for each process type.

simenl avatar Oct 20 '20 13:10 simenl

Are you thinking of a topology where there is a single Kubernetes cluster that spans an entire region? Would you be using the hostname as the lower-order fault domain within each availability zone?

brownleej avatar Oct 20 '20 19:10 brownleej

Are you thinking of a topology where there is a single Kubernetes cluster that spans an entire region? Would you be using the hostname as the lower-order fault domain within each availability zone?

Yes, a single multi-zonal kubernetes cluster spanning the whole region (3 availability zones). The hostname, unique per node, would be used as a lower-order fault domain to ensure that logs are not stored on the same node within an availability zone.

simenl avatar Oct 21 '20 09:10 simenl

Hi; I'm a colleague of @simenl 's - Ideally we would have the largest non-AZ failure domain as the lower-order fault domain within the AZ, but few cloud providers expose that (e.g. we believe it to be rack in AKS) - we're working with our providers on getting precise failure domain information though.

rbtcollins avatar Oct 21 '20 11:10 rbtcollins

Thanks for that context! I agree this seems like a useful configuration to support. We'll also likely want to support configurations where each AZ is its own Kubernetes cluster, but I think it will be reasonable enough to support both.

brownleej avatar Oct 21 '20 14:10 brownleej

Coming back to this: In AKS, node labels don't expose fault domains below the AZ level unless there are no AZs: https://kubernetes-sigs.github.io/cloud-provider-azure/topics/availability-zones/#node-labels. We've spoken with MS and there are no plans to change this. We can run 3 Kubernetes clusters, but I'm not entirely clear on the benefits to FDB of doing that. I am clear on the costs to us though: we'd need to ensure encryption in motion across cluster boundaries, which is something we don't have any other use cases for today. So I'm glad that you're up for supporting the one-kubernetes-cluster configuration.

Is there any sort of timeline on this happening, and / or is it something that we could look at implementing ourselves?

rbtcollins avatar Jan 29 '21 13:01 rbtcollins

We don't have any timelines on this, and I'd be happy for someone outside of Apple to take a swing at it.

brownleej avatar Jan 29 '21 18:01 brownleej

Hi @brownleej and @johscheuer

I would like to start working on this enhancement. As @simenl we use a single K8s Cluster spread across 3+AZs.

The design document is really helpful to get the overall idea. But I would appreciate if you could help me to split this into more actionable tasks.

Some of the assumptions like The current assumption is that we will implement the multi-DC support before we implement this design haven't been met yet. I don't mind working on those but our priority is to get this working for a single cluster first.

Also I am not sure about the status of https://github.com/FoundationDB/fdb-kubernetes-operator/issues/348

I am looking forward to your feedback 🙇‍♂️ Thank you

manfontan avatar Aug 23 '22 13:08 manfontan

Hi @brownleej and @johscheuer

I would like to start working on this enhancement. As @simenl we use a single K8s Cluster spread across 3+AZs.

The design document is really helpful to get the overall idea. But I would appreciate if you could help me to split this into more actionable tasks.

I believe the first step would be to implement support for localities without actually adding support for the three_data_Hall redundancy mode. After that changing the existing methods to support three_data_hall would be the next and last step, including the change in the coordinator selection

Some of the assumptions like The current assumption is that we will implement the multi-DC support before we implement this design haven't been met yet. I don't mind working on those but our priority is to get this working for a single cluster first.

This references the following design: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/plugin_multi_fdb_support.md and just adding the CRD changes without the actual implementation is probably enough to start working on three_data_hall design.

Also I am not sure about the status of #348

It's still an open issue since the support is not implemented.

I am looking forward to your feedback 🙇‍♂️ Thank you

I hope that helped.

johscheuer avatar Aug 23 '22 14:08 johscheuer