[ENH] - Add ability to spinup dask workers in a single AZ (AWS)
Feature description
Ability to spin up dask workers in a single availability Zone in AWS.
Value and/or benefit
While running data intensive tasks via dask workers, it happens quite often that dask workers are spun up in various AZs (Availability zones), which can cause lot of data transfer across AZs, which is not very cheap.
Having this ability will make spinning up large number of dask workers very cost efficient.
Anything else?
No response
Is this related to/fixed by:
https://docs.qhub.dev/en/stable/source/admin_guide/faq.html?highlight=availability#on-aws-why-do-user-instances-occasionally-die-30-minutes-after-spinning-up-a-large-dask-cluster
@dharhas I believe that FAQ fixes another issue. I tried making the change that was suggested and new nodes are still split between the two AZs.
From my perspective, there's a potential short-term solution and a long-term solution that will require a potential update to how we create AWS node-groups.
short term solution
Disable one of the network subnets for the associated AutoScaling group.
- To perform this action, on the AWS console, navigate to
EC2 > Auto Scaling Groupsand select the appropriate auto-scaling group. - Under
Network, remove all but one subnet. This will force all new nodes to spin up using that subnet (and subsequently only in one AZ).
This workaround has the drawback that the associated node-group will raise a "Health Issue":
AutoScalingGroupInvalidConfiguration- it wants two subnets in seperate AZs
long term solution
I believe the long term solution is to have an option to force the node-group to run in a single subnet (ie single AZ). An initial attempt at this solution can be found on the aws_single_subnet branch.
I tested the "long term solution" (on branch aws_single_subnet) and from what I can tell, all of the nodes in the worker node-group spawned in a single AZ (provided that the key single_subnet = true was set in the node-group section). It's probably worth testing this a little more to ensure there are no other unintended consequences.