eksctl icon indicating copy to clipboard operation
eksctl copied to clipboard

[Bug] entire availability zone is removed if one of the instances is missing

Open matti opened this issue 2 years ago • 5 comments

What were you trying to accomplish?

Create managed nodegroups with a list of instances

What happened?

I have a list of instances as follows from ec2-instance-selector:

c5a.8xlarge c6i.8xlarge c6in.8xlarge m5.8xlarge m6i.8xlarge m7i-flex.8xlarge m7i.8xlarge r5.8xlarge r5b.8xlarge r5n.8xlarge r6i.8xlarge

In https://github.com/eksctl-io/eksctl/pull/6464 @TiberiuGC fixed https://github.com/eksctl-io/eksctl/issues/6461 issue by removing all availability zones where one of the instances is not available

skipping eu-north-1a from selection because it doesn't support the following instance type(s): r5b.8xlarge

This is extremely unlucky, as I use ec2-instance-selector to just get a list of machines of certain type - I don't actually care what the exact instance types are, as long as they fit certain criteria.

So now I'm missing one availability zone in my node groups.

matti avatar Sep 12 '23 14:09 matti

a sample of ec2-instance-selector that I'm using to get the list I want. I can not use eksctl built-in ec2-instance-selector as it's bare bones and doesn't support all filtering options provided in ec2-instance-selector

        ec2-instance-selector \
          --region="$region" \
          --availability-zones $zones \
          --vcpus="$vcpus" \
          --memory-min="$memory" \
          --hypervisor nitro \
          --cpu-architecture x86_64 \
          --deny-list "^vt.|^inf.|d\.|en\.|dn\." \
          --gpus 0 \
          --network-performance-max "$network_performance_max" \
          --root-device-type ebs \
          --usage-class="$class" \
          --price-per-hour-max "$price_max" \
          --max-results 100

matti avatar Sep 12 '23 14:09 matti

Hi @matti. If we allow eu-north-1a to be selected, EKS may actually try to create r5b.8xlarge instances within this AZ, as it does not have any kind of filtering/validation mechanism to avoid that. By skipping the AZ, eksctl is merely preventing you from running into the error caught in https://github.com/eksctl-io/eksctl/issues/6461. So, even if we don't skip the AZ, you may occasionally run into a nodegroup creation error.

With that in mind, what would be your desired behaviour?

TiberiuGC avatar Sep 18 '23 08:09 TiberiuGC

error out instead as I am requesting things that can not be fulfilled

matti avatar Sep 18 '23 08:09 matti

now the behaviour is imo unexpected as it "silently" (unless I follow the logs) removes an AZ from what I'm expecting.

matti avatar Sep 18 '23 11:09 matti

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Oct 27 '23 01:10 github-actions[bot]