containers-roadmap [EKS/Fargate] [request]: Schedule evenly pod replicas across AZs

trafficstars

Tell us about your request At the time being, we have observed that there is no guarantee that EKS cluster on Fargate schedule pod replicas evenly across multiple Availability Zones.

Which service(s) is this request for? Fargate, EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We have a fargate profile with 1 namespace and the 2 private subnets. We have a deployment with 2 replicas and we would like that each replica is deployed on a different AZ. At the time being this behaviour is not guaranteed.

We identified two test cases:

Scaling a deployment from 1 replica to 2 replicas: we observed that the 5 times out of 10, the replicas were deployed in the same AZ.
Starting from a new deployment configured with 2 replicas: we observed that 3 times out of 10, the replicas were deployed in the same AZ.

Are you currently working around this issue? Currently we are not aware of any workaround.

Apr 08 '20 16:04 lorenzomicheli

@lorenzomicheli and anyone else following this - Has anyone tried using K8s 1.18's topologySpreadConstraints to deal with this problem? Or will that not work as the Fargate scheduler actually triggers the "node" provision? I'm assuming it does not, but wanted to check before I tried myself 😄

Dec 02 '20 22:12 Gowiem

I have the same question as @Gowiem. Unless I'm doing something wrong, the topologySpreadContraints does not seem to be a solution here. Has anyone tried this with success?

Mar 23 '21 09:03 lennartt

Hi! We have a hard requirement of having at least one pod scheduled to each of 3 AZs in a specific region. Am I right to assume that we have to use managed node groups because topologySpreadConstraints are not (yet?) supported with EKS+Fargate?

Jan 21 '22 22:01 clemensg

Scheduling evenly is not always pure, but I can recommend using affinity.podAntiAffinity as described here https://aws.github.io/aws-eks-best-practices/reliability/docs/application/#schedule-replicas-across-nodes

Feb 18 '22 04:02 r0mdau

Scheduling evenly is not always pure, but I can recommend using affinity.podAntiAffinity as described here https://aws.github.io/aws-eks-best-practices/reliability/docs/application/#schedule-replicas-across-nodes

When topologySpreadConstraints didn't work as advertised I found this issue and tried last comment. The affinity.podAntiAffinity solution also did not do the trick. For example, out of 3 available AZs, two scheduled in b and two scheduled in c, and then scaling to a fifth it was scheduled in b.

I would love to hear if there is a plan to get one of these, or some other way to achieve the objective of them, working.

Apr 06 '22 21:04 wmosher

There are two current approaches. I recommend the second one. For both approaches, you will have to make a deployment for each AZ.

Approach 1

Prepare subnet1 and subnet2 for the AZ1 and AZ2, respectively.
Prepare FP1 with subnet1 and an FP1 selector.
Prepare FP2 with subnet2 and an FP2 selector. This selector must be unique and different from FP1's selector. Don't use the same selector for FP1 and FP2.
For the replica-set you want in FP1, launch it with the labels for FP1. Do the same for FP2.

Don't leave room for ambiguity on which fargate profile a pod should be mapped to.

Approach 2 (recommended)

Create a fargate profile with subnets for both AZ1 and AZ2. You can specify which AZ a pod can be matched to in the pod spec.

apiVersion: apps/v1
kind: Deployment
  replicas: 1
  template:
    spec:
      nodeSelector:
        topology.kubernetes.io/zone: us-east-1a
    containers:
        - name: fargate
          image: "nginx"

Jan 09 '24 20:01 Youssef-Beltagy

@Youssef-Beltagy although this solves the "physical" location of the pods, how do you do add stuff like routing, service, autoscaling, etc. on top of these actually separated deployments?

Jan 10 '24 17:01 marknelissen

Any update on this?

May 22 '24 12:05 charlierm

Doesn’t #1125 solve this? Something along:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: 'topology.kubernetes.io/zone'
  whenUnsatisfiable: ScheduleAnyway
  labelSelector:
    matchLabels:
      ...

May 22 '24 14:05 visit1985

I've found that the following config gives me somewhat the desired behavior:

      topologySpreadConstraints:
      - maxSkew: 1 # only 1 pod diff per az
        minDomains: 3 # use at least 3 AZ's
        topologyKey: 'topology.kubernetes.io/zone'
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
          ...

This way there are at least 3 zones being used, everything scaled above that is still somewhat random and not always perfectly balanced. ScheduleAnyway or DoNotSchedule doesn't really make a difference.

This was using 1 fargate profile with a subnet in each AZ. When I use 3 fargate profiles, with each having only 1 subnet. All pods are picked up by the same profile and end up in the same AZ.

Any input?

May 27 '24 12:05 WDaan

This does not seem to be working for us. We have the following topologySpreadConstraints on our deployment:

      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/instance: karpenter
            app.kubernetes.io/name: karpenter
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule

But sometimes we see both fargate nodes in the same AZ:

name                                                   az           nodepool   instance-type   arch
fargate-ip-100-80-192-164.eu-west-1.compute.internal   eu-west-1a   <none>     <none>          amd64
fargate-ip-100-80-194-106.eu-west-1.compute.internal   eu-west-1a   <none>     <none>          amd64

I looked at https://github.com/aws/containers-roadmap/issues/1125 referenced above, but it only seems to talk about using nodeSelector or nodeAffinity rather than than topologySpreadConstraints.

Jan 13 '25 08:01 jennerm

containers-roadmap containers-roadmap copied to clipboard

[EKS/Fargate] [request]: Schedule evenly pod replicas across AZs

Approach 1

Approach 2 (recommended)

containers-roadmap
containers-roadmap copied to clipboard