containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[Fargate] [request]: Add higher vCPU / Memory options

Open mbj opened this issue 5 years ago • 42 comments

Tell us about your request

Increase maximum allowed vCPU / Memory resources of Fargate tasks.

Which service(s) is this request for?

Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

I want to offload computationally heavy tasks, that can only be locally parallelized to Fargate. Without having to boot an EC2 instance and its associated maintenance overhead.

An example of such a task is the compilation of GHC (the Haskell compiler). Its build system allows parallel computation, but no distribution.

Are you currently working around this issue?

Considering to script the use of a bigger EC2 instance, with its associated maintenance overhead.

Additional context

None.

mbj avatar Feb 13 '19 23:02 mbj

yes, pls support more cpu and more ebs. @abby-fuller , does AWS have any plan on this?

fengyj avatar May 15 '19 05:05 fengyj

Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed (2.2GHz- 2.3GHz) for us and not capable of running CPU/GPU critical applications.

srinivaspype avatar Dec 06 '19 06:12 srinivaspype

Please extend the current vCPUs limit. It will really help a lot of our Customers.

mavericknavs avatar Jul 08 '20 17:07 mavericknavs

Any idea if other CPU option are even considered? This is 'proposed' for a while without feedback from AWS. Would love to have 8/16/32 vcpu options, without having to take care of EC2 maintenance.

Djolivald avatar Nov 20 '20 14:11 Djolivald

Recently discovered Fargate and the idea of not having to manually provision instances, keep the underlying instance up to date, etc. is very attractive. However, a particular workload we have, which involves some ML model evaluation, requires somewhere from 60-100GB of RAM. Will likely fall back to a dedicated ECS/EC2 cluster with autoscaling, but would rather let Fargate handle the process...

luxaritas avatar Jan 06 '21 05:01 luxaritas

I had similar cpu spikes issue when migrated to fargate. In the task definition I have specified 4 cores (maximum resources) initially and which was not sufficient for my app. After specifying cpu in the container >> Environment >> cpu to 4096, the cpu spike is comparatively less. This cpu field is optional for fargate launch type but without it I guess it will only allocating 4 cores for the total number of tasks running, though I am not sure. @luxaritas Similarly for memory if we specify the to 32G in container >> Environment >> memory, it might fix your issue.

Aws definitely needs to upgrade cores to a minimum of 8.

sreeninair avatar Feb 18 '21 03:02 sreeninair

@abby-fuller any update on this request? 8 vcpu would be a start!

andreasunterhuber avatar May 19 '21 10:05 andreasunterhuber

Fargate Compute Optimized CPUs ftw!

ostwalprasad avatar Jun 03 '21 11:06 ostwalprasad

Increase maximum allowed vCPU / Memory resources of Fargate tasks minimum to 8.

harshdubey-sb avatar Jun 07 '21 13:06 harshdubey-sb

any update on this ?

sanjeevpande26 avatar Jun 07 '21 13:06 sanjeevpande26

Can the newly launched AWS App Runner run on more CPUs? Is that an alternative to Fargate?

ostwalprasad avatar Jun 07 '21 13:06 ostwalprasad

Can the newly launched AWS App Runner run on more CPUs? Is that an alternative to Fargate?

No. App Runner can run a subset of the native Fargate options (App Runner can run applications with up to 2 vCPU and 4GB of memory per instance)

mreferre avatar Jun 07 '21 14:06 mreferre

I actually suspect that App runner internally is implemented on Fargate. Such like Fargate is implemented on EC2 etc.

mbj avatar Jun 07 '21 14:06 mbj

App Runner is indeed built on top of Fargate but (today) it cannot be configured to take advantage of all the CPU/mem configurations "raw" Fargate offers.

mreferre avatar Jun 14 '21 09:06 mreferre

At least 8 vcpu will be helpful. this is holding up back from migrating fully to fargate.

sumitverma avatar Jun 26 '21 16:06 sumitverma

I second that, more vCPU would be very useful

rokopi-byte avatar Jul 14 '21 10:07 rokopi-byte

Second that, more vCPU on Fargate would be very useful

tiivik avatar Jul 14 '21 11:07 tiivik

I see that this issue is categorized in the "Researching" project; is there any estimate of when we might see some additional progress on this task? Would love to deploy one of our core services (with high CPU demand) to Fargate, but will likely only be able to do so once Fargate supports up to 12 vCPUs :)

jojofeng avatar Aug 16 '21 23:08 jojofeng

The compute limits are far too restrictive for many applications (e.g. custom video/audio processing). Until this is addressed, Fargate is not a viable solution which is a real shame.

peegee123 avatar Sep 13 '21 09:09 peegee123

Thanks @peegee123 for the feedback. We heard this loud and clear and we want to lift this limitation. Stay tuned.

mreferre avatar Sep 13 '21 11:09 mreferre

We are indeed also heavily relient on fargate and are constantly maxing out our containers now, we would throw more money your way if we can more VCPUS (16 maybe) and up until 64 GB of ram :) !

Some people have asked how we solve it now: https://gist.github.com/alexjeen/984dd2b092ffa49e1c3bf4f6505d0ebe

Basically, we have a ECS ASG set to desired capacity of 0, then if we add a task with a placement constraint of XXL or XXXL it will create a new machine (EC2) for that instance and it will place the task there.

When the task is done running, the EC2 instance will be destroyed so you basically also only pay for what you use.

When Fargate gets higher VCPU and Memory usage options, we would drop this approach.

alexjeen avatar Dec 08 '21 10:12 alexjeen

The ability to be more flexible with the selections would be nice as well. Currently it is not possible to do something like 2 cores and 2gb of memory.

nwsparks avatar Dec 30 '21 13:12 nwsparks

i am using 4gb ram and 4vcpu of fargate? is it possible to dynamically allocate hardware for every request...

sourav-crossml avatar Jan 20 '22 12:01 sourav-crossml

are there any updates planned? In actual other clouds, the Kubernetes serverless specification can be extended up to 64 core / 256 GB.

NoahHahm avatar Jan 26 '22 11:01 NoahHahm

Opened in 2019 and not a peep on it so far? :-(

jbidinger avatar Feb 08 '22 22:02 jbidinger

@jbidinger we are actively working on it.

mreferre avatar Feb 09 '22 07:02 mreferre

We are indeed also heavily relient on fargate and are constantly maxing out our containers now, we would throw more money your way if we can more VCPUS (16 maybe) and up until 64 GB of ram :) !

Some people have asked how we solve it now: https://gist.github.com/alexjeen/984dd2b092ffa49e1c3bf4f6505d0ebe

Basically, we have a ECS ASG set to desired capacity of 0, then if we add a task with a placement constraint of XXL or XXXL it will create a new machine (EC2) for that instance and it will place the task there.

When the task is done running, the EC2 instance will be destroyed so you basically also only pay for what you use.

When Fargate gets higher VCPU and Memory usage options, we would drop this approach.

This is great, I've began to impement this since I had similar problems when running a task in FARGATE. However, I'm using aws cdk and my current problem is that when I run a task via the aws cli it spins up a ec2 instance, runs the task but the instance never gets stopped/terminated. I noticed that the autoscaling group changing the desired capacity from 0 to 1 and I recon that this is the problem. Did you encounter similar problem and how did you solve it?

For reference, here is a snippet of my aws cdk code.

auto_scaling_group = cdk.aws_autoscaling.AutoScalingGroup(self, "MyAsg",
    vpc=vpc,
    instance_type=ec2.InstanceType("t2.xlarge"),
    machine_image=ecs.EcsOptimizedImage.amazon_linux(),
    # Or use Amazon ECS-Optimized Amazon Linux 2 AMI
    # machineImage: EcsOptimizedImage.amazonLinux2(),
    desired_capacity=0,
    max_capacity=1,
    min_capacity=0,
    new_instances_protected_from_scale_in=False, # unsure?
    cooldown=cdk.Duration.seconds(30)
)

capacity_provider = ecs.AsgCapacityProvider(self, 
    "AsgCapacityProvider",
    auto_scaling_group=auto_scaling_group,
    capacity_provider_name='AsgCapacityProvider', # if this is not specified, the Cap fails. Seems like a bug for name ref via id
)
cluster.add_asg_capacity_provider(capacity_provider)

EDIT: After some research I found this article https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/

In particular step 4. Which states that after 15 minutes (or 15 datapoints) it scales in (down). So after waiting 15 mins I achieve the desired outcome. What I dont have found is how to configures this interval. Ideally I would like a scale in directly after a task is finished but if I can configure this to 1-5 mins I would be happy.

tf401 avatar Feb 17 '22 09:02 tf401

Quick update: this is a top priority for us, we're actively developing on it, and will move it to the next roadmap phase within a few weeks.

omieomye avatar Mar 28 '22 19:03 omieomye

Quick update: this is a top priority for us, we're actively developing on it, and will move it to the next roadmap phase within a few weeks.

Would it be possible to provide a rough estimation of when this would be ready and what would be the new memory and vCPU limits that you are aiming for?

It would be very useful if this information could be shared (the answer would not be taken as a commitment but rather as an indication).

Thanks in advance.

ghomem avatar Apr 25 '22 09:04 ghomem

Any updates about that? We are starting to consider the change to EC2 because of this limitation.

luiszimmermann avatar Jun 14 '22 19:06 luiszimmermann