nebari icon indicating copy to clipboard operation
nebari copied to clipboard

Moves nodes to private subnets.

Open dcmcand opened this issue 9 months ago â€Ē 4 comments

Reference Issues or PRs

Closes #2952

What does this implement/fix?

Moves nodes to private subnets and removes the autoassign public IP option.

Currently our nodes are placed in public subnets with a public ip assigned by default. This is a security vulnerability that gives us no benefit whatsoever. The new setup places all nodes in a private subnet while keeping load balancers in public subnets. This will still allow public access to nebari, but you will not be able to access the nodes themselves over the public internet anymore.

The following illustration is from the AWS documentation (https://docs.aws.amazon.com/eks/latest/best-practices/subnets.html) and shows the new setup. Note that this is the recommended setup for EKS on AWS.

image

Put a x in the boxes that apply

  • [X] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds a feature)
  • [ ] Breaking change (fix or feature that would cause existing features not to work as expected)
  • [ ] Documentation Update
  • [ ] Code style update (formatting, renaming)
  • [ ] Refactoring (no functional changes, no API changes)
  • [ ] Build related changes
  • [ ] Other (please describe):

Testing

  • [X] Did you test the pull request locally?
  • [ ] Did you add new tests?

How to test this PR?

Deploy Nebari to AWS, in the console validate that the nodes are located in Private subnets, then go through the testing checklist to validate all functionality is unchanged.

Any other comments?

NOTE This will likely result in issues with the general node not restarting if it ends up in a different AZ from it's EBS volume. This is a known issue and needs addressed by changing our storage setup.

dcmcand avatar Mar 28 '25 14:03 dcmcand

NOTE This will likely result in issues with the general node not restarting if it ends up in a different AZ from it's EBS volume. This is a known issue and needs addressed by changing our storage setup.

This issue is outlined in https://github.com/nebari-dev/nebari/issues/3008. @dcmcand, @viniciusdc, and I had a discussion regarding this limitation and decided that we'll try to first address #3008 before merging this PR.

marcelovilla avatar Mar 31 '25 16:03 marcelovilla

Do not merge until https://github.com/nebari-dev/nebari/issues/3008 is fixed as this will cause difficulties with upgrades.

dcmcand avatar Apr 01 '25 11:04 dcmcand

We need the upgrade path on the next release -- (follow-up release), this is the last remaining bit to get this going

viniciusdc avatar Oct 02 '25 15:10 viniciusdc

I dont think I can help much with providing an update path but I did want to provide my feedback on using this for a while in production. We eventually dropped this change from our deployment due to high cost of the NAT Gateway usage to move data around.

When we moved back from private to public subnets, our upgrade path was a little bit awkward. Its the "reverse" of what an upgrade path for this one, so just in case its helpful, heres how I managed to move from private->public subnets.

  • nebari hangs on the deletion of a public subnet, so I manually deleted the Elastic Load Balancer and nebari proceeded
  • similarly, nebari tries to delete private subnets, but these are dependent on the EKS cluster which also had to be manually terminated
  • the first run of the deployment eventually fails with a 404 for GET /auth/admin/realms/nebari/default-groups (IIUC this is caused by the removal of the EKS cluster invalidating the keycloak state but tofu doesnt know that
  • in the keycloak configuration stage, tofu state rm keycloak_default_groups.default then redeploy

This upgrade path does lose state though, I had to restore keycloak and conda-store state from backups.

asmacdo avatar Oct 06 '25 20:10 asmacdo