nebari icon indicating copy to clipboard operation
nebari copied to clipboard

Add option for AWS node-groups to run in a single subnet/AZ

Open iameskild opened this issue 2 years ago • 3 comments

Fixes | Closes | Resolves #1388

Please remove anything marked as optional that you don't need to fill in. Choose one of the keywords preceding to refer to the issue this PR solves, followed by the issue number (e.g Fixes # 666). If there is no issue, remove the line. Remove this note after reading.

Changes introduced in this PR:

  • This adds a node-group option for it to be created using a single subnet (ie single AZ). At the moment, this will be the default behavior for the worker node-group but there might be interest in having this be the default for all node-groups.
    • Note: this only affects AWS.

Types of changes

What types of changes does your PR introduce?

Put an x in the boxes that apply

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds a feature)
  • [ ] Breaking change (fix or feature that would cause existing features to not work as expected)
  • [ ] Documentation Update
  • [ ] Code style update (formatting, renaming)
  • [ ] Refactoring (no functional changes, no API changes)
  • [ ] Build related changes
  • [ ] Other (please describe):

Testing

Requires testing

  • [x] Yes
  • [ ] No

I have manually tested this but more testing might be need to ensure that we are not breaking other features.

In case you checked yes, did you write tests?

  • [ ] Yes
  • [ ] No

Documentation

Does your contribution include breaking changes or deprecations? If so have you updated the documentation?

  • [ ] Yes, docstrings
  • [ ] Yes, main documentation
  • [ ] Yes, deprecation notices

Further comments (optional)

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered and more.

iameskild avatar Sep 08 '22 16:09 iameskild

@iameskild, just a quick heads-up that this change does have the effect of limiting available IP addresses, which limits the number of concurrent workers.

brl0 avatar Sep 30 '22 19:09 brl0

Thanks for the feedback @brl0! This was a concern that I had and I'm glad we were able to confirm one way or the other.

iameskild avatar Oct 04 '22 16:10 iameskild

@brl0 we recently added an AWS EKS addon (aws-ebs-csi-driver) and I believe there is an addon that will allow us to reduce the number of IP addresses assigned to each of the pods (if I remember correctly its currently ~30 per pod).

I recall researching this before and indeed I did haha - https://github.com/Quansight/qhub/issues/828#issuecomment-935116226 That said, I will likely read through this a bit more but this is promising :)

Have a look here for more info: https://github.com/aws/amazon-vpc-cni-k8s

iameskild avatar Oct 20 '22 04:10 iameskild

@iameskild, I think this same configuration needs to also be applied to the Node Group for user instances as well, since there is significant data transfer between user instances and workers when computing large dataframes etc, because if this change is not applied to user instances additional fees could be incurred for transfers between those them and the workers.

Also, just curious, but do you know if the subnet mask size can be increased to avoid unnecessarily low worker count constraint this change otherwise implies.

brl0 avatar Dec 11 '22 16:12 brl0

@costrouc @viniciusdc this is the last PR that needs to be included in the January release 🎉

iameskild avatar Jan 27 '23 15:01 iameskild