fdb-kubernetes-operator
fdb-kubernetes-operator copied to clipboard
Support three_data_hall redundancy
Many cloud providers have Availability Zones within their regions. These zones are independent from each other, such that failures affecting multiple zones are less likely.
We'd like to run FoundationDB with awareness of these zones, so that FoundationDB is available and suffers no data-loss during a full outage in a single availability zone.
The three_data_hall redundancy seems like a great fit, using the availability zone as locality_data_hall
in each process.
Idea:
Add an option to provide a list of three availability zones (or 'data halls') in the cluster configuration. And also a corresponding label key. The availability zone is often provided as a label on the kubernetes node, as topology.kubernetes.io/zone
.
availability_zone_key: 'topology.kubernetes.io/zone'
availability_zones: ['europe-west1-b', 'europe-west1-c', 'europe-west1-d']
The kubernetes operator would use a nodeSelector to decide which availability zone new pods would be placed on. E.g:
spec:
nodeSelector:
topology.kubernetes.io/zone: europe-west1-b
The availability zone would be passed down to the pod, so that the fdb process can set the correct locality_data_hall
.
The kubernetes operator would need to make sure that the number of pods in each availability zone is balanced for each process type.
Are you thinking of a topology where there is a single Kubernetes cluster that spans an entire region? Would you be using the hostname as the lower-order fault domain within each availability zone?
Are you thinking of a topology where there is a single Kubernetes cluster that spans an entire region? Would you be using the hostname as the lower-order fault domain within each availability zone?
Yes, a single multi-zonal kubernetes cluster spanning the whole region (3 availability zones). The hostname, unique per node, would be used as a lower-order fault domain to ensure that logs are not stored on the same node within an availability zone.
Hi; I'm a colleague of @simenl 's - Ideally we would have the largest non-AZ failure domain as the lower-order fault domain within the AZ, but few cloud providers expose that (e.g. we believe it to be rack in AKS) - we're working with our providers on getting precise failure domain information though.
Thanks for that context! I agree this seems like a useful configuration to support. We'll also likely want to support configurations where each AZ is its own Kubernetes cluster, but I think it will be reasonable enough to support both.
Coming back to this: In AKS, node labels don't expose fault domains below the AZ level unless there are no AZs: https://kubernetes-sigs.github.io/cloud-provider-azure/topics/availability-zones/#node-labels. We've spoken with MS and there are no plans to change this. We can run 3 Kubernetes clusters, but I'm not entirely clear on the benefits to FDB of doing that. I am clear on the costs to us though: we'd need to ensure encryption in motion across cluster boundaries, which is something we don't have any other use cases for today. So I'm glad that you're up for supporting the one-kubernetes-cluster configuration.
Is there any sort of timeline on this happening, and / or is it something that we could look at implementing ourselves?
We don't have any timelines on this, and I'd be happy for someone outside of Apple to take a swing at it.
Hi @brownleej and @johscheuer
I would like to start working on this enhancement. As @simenl we use a single K8s Cluster spread across 3+AZs.
The design document is really helpful to get the overall idea. But I would appreciate if you could help me to split this into more actionable tasks.
Some of the assumptions like The current assumption is that we will implement the multi-DC support before we implement this design
haven't been met yet. I don't mind working on those but our priority is to get this working for a single cluster first.
Also I am not sure about the status of https://github.com/FoundationDB/fdb-kubernetes-operator/issues/348
I am looking forward to your feedback 🙇♂️ Thank you
Hi @brownleej and @johscheuer
I would like to start working on this enhancement. As @simenl we use a single K8s Cluster spread across 3+AZs.
The design document is really helpful to get the overall idea. But I would appreciate if you could help me to split this into more actionable tasks.
I believe the first step would be to implement support for localities without actually adding support for the three_data_Hall redundancy mode. After that changing the existing methods to support three_data_hall would be the next and last step, including the change in the coordinator selection
Some of the assumptions like
The current assumption is that we will implement the multi-DC support before we implement this design
haven't been met yet. I don't mind working on those but our priority is to get this working for a single cluster first.
This references the following design: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/plugin_multi_fdb_support.md and just adding the CRD changes without the actual implementation is probably enough to start working on three_data_hall design.
Also I am not sure about the status of #348
It's still an open issue since the support is not implemented.
I am looking forward to your feedback 🙇♂️ Thank you
I hope that helped.