Distributed-CellProfiler icon indicating copy to clipboard operation
Distributed-CellProfiler copied to clipboard

Spot instance not joining the cluster when ASSIGN_IP = False

Open bethac07 opened this issue 1 year ago • 2 comments

Kamal ran into an issue where his spot instances were not joining the cluster with ASSIGN_IP=False; setting it to true made the instances appear in the cluster again. So something funny is happening.

bethac07 avatar Nov 19 '24 20:11 bethac07

Instances were not discoverable by ECS. Perhaps something to do with needing subnet to be private? AWS blog post with charge details and alt configurations. Example with NAT gateway - not sure this is actually the way we need/want to go. AWS ECS CLI issue with potentially helpful information

ErinWeisbart avatar Nov 25 '24 22:11 ErinWeisbart

(Not solved yet, notes)

So in doing a bit of reading, we either need to have all of our containers in the Amazon ECR, and use Private Link, and/or set up NAT Gateways, probably with a load balancer, that can access the internet ; these things are necessary because a) the EC2 instance needs to be able to talk to ECS to get instructions on what it should do (load a container, shut down a container, etc) and b) it needs to be able to talk to some sort of a container registry, either Amazon's or DockerHub, to actually get the containers.

The former option, PrivateLink, doesn't look like it's somthing we can/want to generate on the-fly so will be at current pricing at least ~$520/year per account, plus ECR storage charges (which should be minimal, in the single digits or tens of dollars, if I'm reading right). Haven't priced out NAT + load balancer yet. But I suspect for all but the most frequent of users (us), a standing cost of $520 is > the amount spent on IPv4 addresses in the current setup.

NAT Gateways might be easier to tear up and then down, but it will mean either a) having standing NAT Gateways in certain subnets (which I think is even more expensive than above) OR needing the fleet or config file to provide two subnets - the private subnet we want the machines launched in, and the public subnet the NAT Gateway is to be made in (and connected to the private subnet).

bethac07 avatar Nov 26 '24 13:11 bethac07