claudie icon indicating copy to clipboard operation
claudie copied to clipboard

Feature: Uniformly distribute nodes across zones by default

Open bernardhalas opened this issue 11 months ago • 0 comments

Motivation

Typically a cloud provider region consists of several zones, which represent isolated datacenters in geographical proximity, or in some cases isolated "fire-cells" in the same datacenter.

There's a general recommendation, to spread the cluster-nodes in the same region across these availability zones for resiliency reasons. Claudie doesn't respect this well enough.

As an example - creating a new nodepool currently requires a region + zone specification. Imagine a cluster with a single nodepool, that's autoscaled - all the nodes will be created in the same availability zone. And if that zone is under an outage, the whole cluster workload is having an outage as well.

If the workload was spread out, let's say across 3 zones, only 1/3 of the workload services would have an outage should a single zone die.

Description

Make the following specs optional on AWS/Azure/GCP/OCI/Hetzner: nodePools.dynamic[].providerSpec.zone.

If the providerSpec.zone is not specified, then the terraformer templates are enhanced to:

  • pull list of zones from datasources
  • assign a different zone (e.g. by using node id % zone-count approach) to each new node
  • however, providerSpec.zone is still respected, if specified

Exit criteria

  • [ ] providerSpec.zone is optional
  • [ ] the first node is not hardcoded to zone 1
  • [ ] nodes from a single nodepool are uniformly distributed across all zones in the region
  • [ ] docs are updated

bernardhalas avatar Jan 06 '25 17:01 bernardhalas