containers-roadmap
containers-roadmap copied to clipboard
[Fargate] [request]: Add higher vCPU / Memory options
Tell us about your request
Increase maximum allowed vCPU / Memory resources of Fargate tasks.
Which service(s) is this request for?
Fargate
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I want to offload computationally heavy tasks, that can only be locally parallelized to Fargate. Without having to boot an EC2 instance and its associated maintenance overhead.
An example of such a task is the compilation of GHC (the Haskell compiler). Its build system allows parallel computation, but no distribution.
Are you currently working around this issue?
Considering to script the use of a bigger EC2 instance, with its associated maintenance overhead.
Additional context
None.
yes, pls support more cpu and more ebs. @abby-fuller , does AWS have any plan on this?
Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed (2.2GHz- 2.3GHz) for us and not capable of running CPU/GPU critical applications.
Please extend the current vCPUs limit. It will really help a lot of our Customers.
Any idea if other CPU option are even considered? This is 'proposed' for a while without feedback from AWS. Would love to have 8/16/32 vcpu options, without having to take care of EC2 maintenance.
Recently discovered Fargate and the idea of not having to manually provision instances, keep the underlying instance up to date, etc. is very attractive. However, a particular workload we have, which involves some ML model evaluation, requires somewhere from 60-100GB of RAM. Will likely fall back to a dedicated ECS/EC2 cluster with autoscaling, but would rather let Fargate handle the process...
I had similar cpu spikes issue when migrated to fargate. In the task definition I have specified 4 cores (maximum resources) initially and which was not sufficient for my app. After specifying cpu in the container >> Environment >> cpu to 4096, the cpu spike is comparatively less. This cpu field is optional for fargate launch type but without it I guess it will only allocating 4 cores for the total number of tasks running, though I am not sure. @luxaritas Similarly for memory if we specify the to 32G in container >> Environment >> memory, it might fix your issue.
Aws definitely needs to upgrade cores to a minimum of 8.
@abby-fuller any update on this request? 8 vcpu would be a start!
Fargate Compute Optimized CPUs ftw!
Increase maximum allowed vCPU / Memory resources of Fargate tasks minimum to 8.
any update on this ?
Can the newly launched AWS App Runner run on more CPUs? Is that an alternative to Fargate?
Can the newly launched AWS App Runner run on more CPUs? Is that an alternative to Fargate?
No. App Runner can run a subset of the native Fargate options (App Runner can run applications with up to 2 vCPU and 4GB of memory per instance)
I actually suspect that App runner internally is implemented on Fargate. Such like Fargate is implemented on EC2 etc.
App Runner is indeed built on top of Fargate but (today) it cannot be configured to take advantage of all the CPU/mem configurations "raw" Fargate offers.
At least 8 vcpu will be helpful. this is holding up back from migrating fully to fargate.
I second that, more vCPU would be very useful
Second that, more vCPU on Fargate would be very useful
I see that this issue is categorized in the "Researching" project; is there any estimate of when we might see some additional progress on this task? Would love to deploy one of our core services (with high CPU demand) to Fargate, but will likely only be able to do so once Fargate supports up to 12 vCPUs :)
The compute limits are far too restrictive for many applications (e.g. custom video/audio processing). Until this is addressed, Fargate is not a viable solution which is a real shame.
Thanks @peegee123 for the feedback. We heard this loud and clear and we want to lift this limitation. Stay tuned.
We are indeed also heavily relient on fargate and are constantly maxing out our containers now, we would throw more money your way if we can more VCPUS (16 maybe) and up until 64 GB of ram :) !
Some people have asked how we solve it now: https://gist.github.com/alexjeen/984dd2b092ffa49e1c3bf4f6505d0ebe
Basically, we have a ECS ASG set to desired capacity of 0, then if we add a task with a placement constraint of XXL or XXXL it will create a new machine (EC2) for that instance and it will place the task there.
When the task is done running, the EC2 instance will be destroyed so you basically also only pay for what you use.
When Fargate gets higher VCPU and Memory usage options, we would drop this approach.
The ability to be more flexible with the selections would be nice as well. Currently it is not possible to do something like 2 cores and 2gb of memory.
i am using 4gb ram and 4vcpu of fargate? is it possible to dynamically allocate hardware for every request...
are there any updates planned? In actual other clouds, the Kubernetes serverless specification can be extended up to 64 core / 256 GB.
Opened in 2019 and not a peep on it so far? :-(
@jbidinger we are actively working on it.
We are indeed also heavily relient on fargate and are constantly maxing out our containers now, we would throw more money your way if we can more VCPUS (16 maybe) and up until 64 GB of ram :) !
Some people have asked how we solve it now: https://gist.github.com/alexjeen/984dd2b092ffa49e1c3bf4f6505d0ebe
Basically, we have a ECS ASG set to desired capacity of 0, then if we add a task with a placement constraint of XXL or XXXL it will create a new machine (EC2) for that instance and it will place the task there.
When the task is done running, the EC2 instance will be destroyed so you basically also only pay for what you use.
When Fargate gets higher VCPU and Memory usage options, we would drop this approach.
This is great, I've began to impement this since I had similar problems when running a task in FARGATE. However, I'm using aws cdk and my current problem is that when I run a task via the aws cli it spins up a ec2 instance, runs the task but the instance never gets stopped/terminated. I noticed that the autoscaling group changing the desired capacity from 0 to 1 and I recon that this is the problem. Did you encounter similar problem and how did you solve it?
For reference, here is a snippet of my aws cdk code.
auto_scaling_group = cdk.aws_autoscaling.AutoScalingGroup(self, "MyAsg",
vpc=vpc,
instance_type=ec2.InstanceType("t2.xlarge"),
machine_image=ecs.EcsOptimizedImage.amazon_linux(),
# Or use Amazon ECS-Optimized Amazon Linux 2 AMI
# machineImage: EcsOptimizedImage.amazonLinux2(),
desired_capacity=0,
max_capacity=1,
min_capacity=0,
new_instances_protected_from_scale_in=False, # unsure?
cooldown=cdk.Duration.seconds(30)
)
capacity_provider = ecs.AsgCapacityProvider(self,
"AsgCapacityProvider",
auto_scaling_group=auto_scaling_group,
capacity_provider_name='AsgCapacityProvider', # if this is not specified, the Cap fails. Seems like a bug for name ref via id
)
cluster.add_asg_capacity_provider(capacity_provider)
EDIT: After some research I found this article https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/
In particular step 4. Which states that after 15 minutes (or 15 datapoints) it scales in (down). So after waiting 15 mins I achieve the desired outcome. What I dont have found is how to configures this interval. Ideally I would like a scale in directly after a task is finished but if I can configure this to 1-5 mins I would be happy.
Quick update: this is a top priority for us, we're actively developing on it, and will move it to the next roadmap phase within a few weeks.
Quick update: this is a top priority for us, we're actively developing on it, and will move it to the next roadmap phase within a few weeks.
Would it be possible to provide a rough estimation of when this would be ready and what would be the new memory and vCPU limits that you are aiming for?
It would be very useful if this information could be shared (the answer would not be taken as a commitment but rather as an indication).
Thanks in advance.
Any updates about that? We are starting to consider the change to EC2 because of this limitation.