containers-roadmap
containers-roadmap copied to clipboard
[EKS/Fargate] [request]: Schedule evenly pod replicas across AZs
Tell us about your request At the time being, we have observed that there is no guarantee that EKS cluster on Fargate schedule pod replicas evenly across multiple Availability Zones.
Which service(s) is this request for? Fargate, EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We have a fargate profile with 1 namespace and the 2 private subnets. We have a deployment with 2 replicas and we would like that each replica is deployed on a different AZ. At the time being this behaviour is not guaranteed.
We identified two test cases:
-
Scaling a deployment from 1 replica to 2 replicas: we observed that the 5 times out of 10, the replicas were deployed in the same AZ.
-
Starting from a new deployment configured with 2 replicas: we observed that 3 times out of 10, the replicas were deployed in the same AZ.
Are you currently working around this issue? Currently we are not aware of any workaround.
@lorenzomicheli and anyone else following this - Has anyone tried using K8s 1.18's topologySpreadConstraints to deal with this problem? Or will that not work as the Fargate scheduler actually triggers the "node" provision? I'm assuming it does not, but wanted to check before I tried myself 😄
I have the same question as @Gowiem. Unless I'm doing something wrong, the topologySpreadContraints does not seem to be a solution here. Has anyone tried this with success?
Hi! We have a hard requirement of having at least one pod scheduled to each of 3 AZs in a specific region. Am I right to assume that we have to use managed node groups because topologySpreadConstraints are not (yet?) supported with EKS+Fargate?
Scheduling evenly is not always pure, but I can recommend using affinity.podAntiAffinity as described here https://aws.github.io/aws-eks-best-practices/reliability/docs/application/#schedule-replicas-across-nodes
Scheduling evenly is not always pure, but I can recommend using
affinity.podAntiAffinityas described here https://aws.github.io/aws-eks-best-practices/reliability/docs/application/#schedule-replicas-across-nodes
When topologySpreadConstraints didn't work as advertised I found this issue and tried last comment. The affinity.podAntiAffinity solution also did not do the trick. For example, out of 3 available AZs, two scheduled in b and two scheduled in c, and then scaling to a fifth it was scheduled in b.
I would love to hear if there is a plan to get one of these, or some other way to achieve the objective of them, working.
There are two current approaches. I recommend the second one. For both approaches, you will have to make a deployment for each AZ.
Approach 1
- Prepare subnet1 and subnet2 for the AZ1 and AZ2, respectively.
- Prepare FP1 with subnet1 and an FP1 selector.
- Prepare FP2 with subnet2 and an FP2 selector. This selector must be unique and different from FP1's selector. Don't use the same selector for FP1 and FP2.
- For the replica-set you want in FP1, launch it with the labels for FP1. Do the same for FP2.
Don't leave room for ambiguity on which fargate profile a pod should be mapped to.
Approach 2 (recommended)
Create a fargate profile with subnets for both AZ1 and AZ2. You can specify which AZ a pod can be matched to in the pod spec.
apiVersion: apps/v1
kind: Deployment
replicas: 1
template:
spec:
nodeSelector:
topology.kubernetes.io/zone: us-east-1a
containers:
- name: fargate
image: "nginx"
@Youssef-Beltagy although this solves the "physical" location of the pods, how do you do add stuff like routing, service, autoscaling, etc. on top of these actually separated deployments?
Any update on this?
Doesn’t #1125 solve this? Something along:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: 'topology.kubernetes.io/zone'
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
...
I've found that the following config gives me somewhat the desired behavior:
topologySpreadConstraints:
- maxSkew: 1 # only 1 pod diff per az
minDomains: 3 # use at least 3 AZ's
topologyKey: 'topology.kubernetes.io/zone'
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
...
This way there are at least 3 zones being used, everything scaled above that is still somewhat random and not always perfectly balanced. ScheduleAnyway or DoNotSchedule doesn't really make a difference.
This was using 1 fargate profile with a subnet in each AZ. When I use 3 fargate profiles, with each having only 1 subnet. All pods are picked up by the same profile and end up in the same AZ.
Any input?
This does not seem to be working for us. We have the following topologySpreadConstraints on our deployment:
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: karpenter
app.kubernetes.io/name: karpenter
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
But sometimes we see both fargate nodes in the same AZ:
name az nodepool instance-type arch
fargate-ip-100-80-192-164.eu-west-1.compute.internal eu-west-1a <none> <none> amd64
fargate-ip-100-80-194-106.eu-west-1.compute.internal eu-west-1a <none> <none> amd64
I looked at https://github.com/aws/containers-roadmap/issues/1125 referenced above, but it only seems to talk about using nodeSelector or nodeAffinity rather than than topologySpreadConstraints.