containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[Fargate] [request]: Burstable CPU

Open luhn opened this issue 5 years ago • 35 comments

Tell us about your request Similar to how t1/t2/t3 instances work, you are allocated a portion of the CPU but can burst to 100%.

Which service(s) is this request for? Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I've migrated a few lower-traffic services that were running t2 instances. I didn't think about it at the time, but 25% of a CPU core 100% of the time is a lot different than 100% of a CPU core 25% of the time. For more CPU heavy endpoints my latency increased substantially and brought the service to a crawl.

Are you currently working around this issue? Paying for a full CPU core :(

Additional context

Attachments

luhn avatar Feb 13 '19 21:02 luhn

Any ETA on this one? We'd be glad to switch to Fargate, but this pricing model makes a huge difference.

effyteva avatar Mar 27 '20 06:03 effyteva

I would also like to see this come true, it is the only thing preventing us from using fargate.

Sytten avatar Apr 08 '20 22:04 Sytten

This would allow our tasks to start faster. After the task is up and running it no longer requires the same amount of CPU as it takes to prepare the task for operation.

raehalme avatar Apr 11 '20 10:04 raehalme

Start-up time of services took a hit after moving to Fargate. Would highly appreciate to have this feature available.

bogdanbrindusan avatar Nov 02 '20 17:11 bogdanbrindusan

+1 (again)

effyteva avatar Nov 02 '20 17:11 effyteva

This is something we are discussing internally but we don't have a timeline to share. The problem is properly framed and understood.

Thanks for the engagement.

mreferre avatar Nov 04 '20 12:11 mreferre

I'd love this as well. Especially if you had burstable Fargate Spot.

misterjoshua avatar Feb 02 '21 20:02 misterjoshua

Burstable CPU would be ideal. In our case, the issue is only at startup(Spring Boot apps). If there was some way to allocate, say 1 full CPU for the first task minute and then go down to 1/4 CPU it would be a good enough solution. Knowing nothing about the internal workings of ECS, I don't know if such a thing would be easier to implement though. Though conceptually simpler, I'm inclined to think ECS uses EC2 under the hood and could somehow take advantage of its burstable instances.

peterlgh7 avatar Apr 05 '21 15:04 peterlgh7

To be clear, this isn't much about ECS (the control plane) but more about how EC2 and Fargate (the data planes) work. When you pick EC2 you are most likely to elect to use each instance to support more than one task (and most likely you are not even planning to use the entire EC2 instance 100%). This creates opportunities (through reservations and ceilings) to accommodate some bursting for tasks that share the same resources (i.e. the EC2 instance). With Fargate (the way it works today) it is not possible because the reservation and the ceilings match and that doesn't allow for bursting flexibility. We are working to allow for this (flexibility). More background on how this works can be found here

mreferre avatar Apr 06 '21 10:04 mreferre

Are you saying that you are working to let multiple tasks share the same Fargate host, so that you might be able to binpack them rather than dedicating resources for each task?

Or are you saying to implement t2/t3-style CPU bursting in Fargate, you first need to decouple Fargate host resources and task resources, so that a task might be able to utilize 100% of the bursted CPU?

luhn avatar Apr 06 '21 17:04 luhn

One of the tenets of Fargate is 1 task per "kernel/OS" and we are not compromising that. So we are looking more at the latter. Where a task can consume resources that are not just constrained by the size you picked. One example would be to allow applications to burst at start time cope with high CPU usage some apps require to startup. But the details of the how, what and when are still TBD.

mreferre avatar Apr 07 '21 08:04 mreferre

Need this burst at startup. Then low cpu usage and charges at runtime. Don't want to go back to the complexity, additional layer of virtual infrastructure, patching pain and risk, bin packing hassles, and lack of isolation with EC2.

CapnPhatzz avatar Jul 02 '21 19:07 CapnPhatzz

Just migrated a container to fargate, and startup times are really the issue here:

  • 0.5vcpu : ~70s
  • 1.0vcpu: ~33s
  • 2.0vcpu: ~20s We will prefer suffering from slow startup times, because paying for 2vcpus is just too expensive. Kind of disappointing. =/

iwt-gregorpoloczek avatar Nov 03 '21 10:11 iwt-gregorpoloczek

Just migrated a container to fargate, and startup times are really the issue here:

* 0.5vcpu : ~70s

* 1.0vcpu: ~33s

* 2.0vcpu: ~20s
  We will prefer suffering from slow startup times, because paying for 2vcpus is just too expensive. Kind of disappointing. =/

Thanks @iwt-gregorpoloczek for the feedback. Can I ask you what's the profile of that application? Is it a java app? Something else? Thanks.

mreferre avatar Nov 03 '21 17:11 mreferre

Hey folks, Zach here from the Fargate PM team. I'm working on Fargate's direction around performance. I'm very interested in knowing a few more details here. A few questions:

  1. Other than startup times, what are some use cases and workload types for CPU bursting at startup? I've heard a bunch about Spring Boot. Are there others?
  2. Would CPU bursting at startup (e.g. burst up to 2 CPU for the first 5 minutes) be sufficient? Or are there workload types which could benefit from runtime bursting?
  3. How many CPUs would you need for short periods of time? Is 2 CPUs sufficient (e.g. 0.25, 0.5, and 1 CPU tasks/pods can burst to 2 CPU)?

Thanks!

zachcasper avatar Mar 07 '22 17:03 zachcasper

Would CPU bursting at startup (e.g. burst up to 2 CPU for the first 5 minutes) be sufficient? Or are there workload types which could benefit from runtime bursting?

For my use case, startup bursting wouldn't be sufficient. A full vCPU is overkill for my low-traffic services, however capping at 25% vCPU means a quadrupling of latency for CPU-bound tasks, leading to an unacceptable increase in overall latency for the application. t3 instances are perfect for these workloads because I can burst to a full 2 vCPUs, keeping latency low, but only pay for the low average that I use. I'd like to have the same option in Fargate.

luhn avatar Mar 07 '22 17:03 luhn

+1 on what @luhn said. We're currently paying for much higher end servers than what we need. T instances would be much better for our use cases (Mostly Web Services), as our traffic changes every hour.

Startup times are a nice bonus, but that's not critical for us at all, 20-70 seconds is fast enough.

effyteva avatar Mar 07 '22 17:03 effyteva

1. Other than startup times, what are some use cases and workload types for CPU bursting at startup?  I've heard a bunch about Spring Boot.  Are there others?

Yes. For our projects:

  • One nodejs-based project downloads large .zips from s3 and extracts them on bootup, then serves web requests without needing much cpu.
  • Another project clones a large git repository on boot and doesn't need much CPU to serve web requests after that.
2. Would CPU bursting at startup (e.g. burst up to 2 CPU for the first 5 minutes) be sufficient?  Or are there workload types which could benefit from runtime bursting?

Other workload types would benefit:

  • The same nodejs and git-using projects as above redownload/clone periodically.
  • Need burst when an admin user rebuilds site-wide cache or grant tables on a Drupal site hosted in Fargate
  • The same software needs to burst when a user hits a page the first time and it isn't in the cache
  • Some of our Symfony-based apps allows "imports" and during this period, a lot of data is being handled all at once and would like burst so that requests don't time out.
3. How many CPUs would you need for short periods of time?  Is 2 CPUs sufficient (e.g. 0.25, 0.5, and 1 CPU tasks/pods can burst to 2 CPU)?

EC2's bursting works great for all of the above. I could see the tasks using one or two vcpus in Fargate for several minutes every hour.

misterjoshua avatar Mar 07 '22 17:03 misterjoshua

My app uses the Meteor Javascript framework, which compiles to a Node server. Generally the server doesn't need much CPU, and the occasional bottleneck isn't a big deal. The issue is that moving connected users over to a new set of instances is a very CPU-heavy operation, and during that time the site is unresponsive for everyone. Note that this is separate from server startup time, which isn't that important to me; it's the process of setting up sessions for a whole bunch of users all at once, and the site being unresponsive for awhile as a result.

Being able to burst to 2 vCPU for the first five minutes would be huge – a vast improvement. For me, that's all I'd need.

(Btw my experience about startup getting bogged down seems to be the norm with Meteor, at least according to posts on the Meteor forum.)

banjerluke avatar Mar 07 '22 19:03 banjerluke

@zachcasper Our app sometimes has to do some image processing requiring a lot of cpu power. This can be short random moments so just boot time won't cover it.

rhertogh avatar Apr 13 '22 20:04 rhertogh

We wanted to move to Fargate. Our startup consists of multiple services. Most of them use very little CPU. The difference is during a deployment. Then for a short period, the CPU usage is much bigger. This was fine on an EC2-based ECS Cluster. Without it the costs of infra would be much higher for us.

miensol avatar Jul 29 '22 11:07 miensol

@miensol Assuming at steady state runtime you could do with a fraction of a CPU (e.g. 0.25 or 0.5), how much CPU power would you need and for how long (during the bootstrap)? In other words what's the "short period" and what's "much bigger" in terms of CPU?

mreferre avatar Jul 29 '22 12:07 mreferre

@mreferre Typically it's 1-2 minutes burst during a deployment. After that, the CPU is mostly idling e.g. <20% for a day until next deployment.

miensol avatar Jul 29 '22 12:07 miensol

Here is a workload that requires runtime bursting. Sometimes auto scaling can't keep up because our product requires 1000x or more throughput in a short period of time.

Our product is critical if it is missing as it relates to data collection. So I'd like to have some safeguards via burstable cpu as well as auto scaling.

Hoto-Cocoa avatar Oct 05 '22 01:10 Hoto-Cocoa

Being able to burst for up to 5 minutes to 2 vcpu from 0.25, 0.5 or 1 would be just perfect. Nice and stable and cheap at runtime but super slow startup. My use case is Java & Spring Boot as mentioned. Thanks!

On Mon, Mar 7, 2022 at 2:02 PM Luke @.***> wrote:

My app uses the Meteor Javascript framework, which compiles to a Node server. Generally the server doesn't need much CPU, and the occasional bottleneck isn't a big deal. The issue is that moving connected users over to a new set of instances is a very CPU-heavy operation, and during that time the site is unresponsive for everyone. Note that this is separate from server startup time, which isn't that important to me; it's the process of setting up sessions for a whole bunch of users all at once, and the site being unresponsive for awhile as a result.

Being able to burst to 2 vCPU for the first five minutes would be huge – a vast improvement. For me, that's all I'd need.

(Btw my experience about startup getting bogged down seems to be the norm with Meteor, at least according to posts on the Meteor forum.)

— Reply to this email directly, view it on GitHub https://github.com/aws/containers-roadmap/issues/163#issuecomment-1061026706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAMOOMUPFC3MGFQAI3KCNTU6ZHGBANCNFSM4GXJGFMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

CapnPhatzz avatar Oct 11 '22 08:10 CapnPhatzz

+1 on this Warmpools for ECS Fargate will help us with the windows container workloads that almost takes 5 minutes for the tasks to start.

chedebala avatar Mar 14 '23 18:03 chedebala

I have am app that runs on WildFly what would start up much faster if CPU could burst during startup.

In addition to general startup overhead, (similar to a boot app), the app also runs some config operations which use a java tool to update its config files. The extra CPU would help these jvm processes complete much faster.

tunaranch avatar Mar 14 '23 23:03 tunaranch

+1, this is a big reason I avoid Fargate.

amexboy avatar Jun 07 '23 15:06 amexboy

Considering moving our stack off ECS because of this... Massively overprovisioned stack just to handle boot/runtime.

duckducknono avatar Jun 15 '23 08:06 duckducknono

This thread has been open for more than 4 year so unlikely this issue will ever be prioritized. For my application Fargate is not cost effective. Too bad as I don't look forward to running my own cluster...

frostyone1 avatar Jul 18 '23 11:07 frostyone1