containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[ECS] [request]: Editable CapacityProviderReservation AlarmLow duration (is always 15min)

Open monsieurgustav opened this issue 3 years ago • 2 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request EC2 scale in via ECS Capacity Provider is too slow ue to the hard coded 15min alarm-low. I wish I could change it to 5 or 10 min.

Which service(s) is this request for? ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I run tasks that execute jobs stored in a SQS. A job takes few minutes to execute. Most of the time, there is no job at all ; 1 job will be created randomly ; sometimes N jobs will be created at once. When all jobs are finished, I want to scale in quite fast because it is likely there won't be new job in the next few minutes.

CAS automatically creates a "CapacityProviderReservation AlarmLow" that alarms after 15min. Scale in is then very slow, compared to a few minutes job.

Fargate would be an option, but a GPU is required.

Are you currently working around this issue? I don't, I pay for idling EC2 instances.

monsieurgustav avatar Jan 13 '21 12:01 monsieurgustav

I have exactly the same issue as Guillaume. Consuming an SQS queue that receives very sporadic and bursty requests that sometimes get handled very quickly, sometimes take several minutes. I'm contemplating a Cloud Formation custom resource to automatically go in there and hack that 15 min on that alarm down to something more like 5 or even less. My tasks all self-destruct pretty much (1 + visibility) mins after zero messages are found, so I want to container instances to pretty die a minute or 3 after that.

matt-theguyw1cat avatar May 11 '21 00:05 matt-theguyw1cat

Similar use case here. We use EC2-backed ECS because our jobs tend to run for a couple of minutes (more than 2 minutes) and they require GPUs. We have very predictable hourly spikes in traffic. For example, traffic increases by 100x between hh:50 to hh:15 every hour. We want to be able to scale the cluster in along with all the underlying EC2 instances immediately after the traffic goes down.

Due to the 15 minute delay in the CloudWatch alarm for the underlying ASG, we have to pay for the idling 15 minutes, which is 25% of each hour. We end up paying for a big chunk of idling period.

Ideally we would like to use a different scaling policy than the ECS managed target tracking policy, but that's not allowed. Alternatively, we'd like to be able to modify the samples needed to trigger the alarm so it's not always 15.

nshi avatar Sep 21 '22 12:09 nshi