containers-roadmap
containers-roadmap copied to clipboard
AWS Fargate GPU Support: When is GPU support coming to fargate?
Tell us about your request What do you want us to build?
Which service(s) is this request for? This could be Fargate, ECS, EKS, ECR
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
Are you currently working around this issue? How are you currently solving this problem?
Additional context Anything else we should know?
Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
Hi There, can you give us more details about your use case? Instance type, CUDA version, and more info about what you're trying to do - workload, etc.? Thanks.
We would like to run object detection on Fargate.
Setup: CUDA version 9.0, 9.1 (both work) Instance type p2.xlarge Algorithm: Object detection Input: Frame Output: Metadata preferably json with coordinates and confidence. TPS: 10 frames/sec
Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts?
Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts?
No
I have a similar use case. I'd like to run deep learning inference tasks on CUDA-capable GPUs on Fargate (edit: or Lambda), and pay per second of usage.
The specific use case is inference tasks which are run fairly seldom, but need to respond in seconds, rather than minutes. In other words, waiting a few minutes for an EC2 instance to boot up, just doesn't cut the mustard. But neither does the application need to be taking up a GPU 24/7 unproductively, just to run the inference job for a minute or two, twice a day.
Edit: By mid-2021, extremely easy quantization and optimization, along with with better models, have removed my need for this use case - but I suppose the people giving the comment the thumbs up might still have something going on in this direction.
I also have an inference use-case where we would like to be able to autoscale inference sqs workers in Fargate. We originally tried to use ECS, but found it too cumbersome to scale both the containers and the EC2 instances, so we are currently just using EC2 instances with an autoscaling group. We considered using Sagemaker, but that will require some engineering effort for us to adapt our architecture and models.
I'd be interested in this too and have similar usecases as above.
I have a use case for this too, where we want to spin up GPU resources to do live video streaming of a WebGL application but be able to relinquish those completely after the stream ends, with minimal start up time or over-metering. In our case, we would need the ability to run an X11 server with GPU hardware acceleration.
@mbnr85 I too am trying to do object detection on fargate. Is this even possible (for now)? Have you found anything? What did you do in your case?
When training data science models our workloads can take advantage of GPU compute. To start those workloads will run in ECS although eventually we’d likely migrate those to EKS. We’d like to be able to use Fargate to run GPU accelerated workloads but that is not currently supported. Does AWS have GPU compute on the Fargate roadmap, and if so, is there any timeline that can be shared?
Also interested for machine learning...
Interested for ML training and inference as well. The overhead to transfer to sagemaker is too high, we just train models on EC2 GPU boxes and then use CPU runtime for inference on Fargate instances. However, some models would benefit from GPU at inference time (namely those trained on CUDA specific implementations, which as of now we are not using for lack of inference infrastructure). The inference use case is sporadic, such that a full-time EC2 box is too pricey.
@romanovzky We both are on the same boat I guess. I too am in a similar situation.
I too am looking forward for this feature.
My use-case:
I need to run jobs that benefit from GPU acceleration (mostly model inference and some CPU bound tasks eg. embedding clustering, DB insertions etc.). Each job takes around 10-15 mins on a p2.xlarge. I receive 100-120 such jobs through the day (get 8-10 jobs in the span of 30 sec at max).
My requirement:
A server-less GPU container solution.
My current solution:
My GPU utilizing containers run as custom Sagemaker training jobs.
Advantages:
- With my increased Sagemaker limit on p2.xlarge systems, I can have 20 jobs running in parallel. And 0 idle cost. So, sort of server-less GPU containers :)
- Per-second billing.
- My containers have minimal Sagemaker specific code and hence can be easily run on EC2, ECS or even my own desktop system.
Disadvantages:
- Sagemaker actually spawns a new instance for my container. This results in longer wait times. (Usually 2x Fargate wait times.)
- Need to add additional logic in my lambda function that triggers Fargate jobs and Sagemaker jobs separately.
Also.... Some machine learning models require GPU support for predictions (they will not predict on CPU).
For example (an InternalError that can occur when attempting to get a RefineNet predictions on CPU): InternalError: The CPU implementation of FusedBatchNorm only supports NHWC tensor format for now.
I too support GPU support with Fargate
We would like to call from a Docker container (RStudio) several others for a distributed deep/machine learning training using Fargate/AWS Batch. The results should be saved on S3 and wrote back to the RStudio Docker container. Unfortunately, Fargate shows no support for GPUs.
I would also like to launch GPU containers from Fargate. I have two use-cases: 1. spawning powerful deep learning Jupyterhub development environments for our machine-learning group's researchers that will effortlessly disappear when the individual Jupyterhub kernel is killed. 2. Infrequent, quickly-scaled, deep (i.e. the use of GPU is justified) inference tasks.
a thought: for 2., I hadn't thought of using the suggestion above of an auto-scaling EC2 group (that presumably then use something like a scripted docker-machine command to provision the instance, and launch a kernel container) to run the GPU containers, but this seems like a nasty, expensive (in time and currency) hack for what should be a bit more elegant.
Any news on this?
@ClaasBrueggemann I dont think they will provide this anytime soon. AWS is heavily promoting SageMaker now and in many/most cases that's the way to go. :)
what about for 3d model rendering? we aren't needing this for machine learning.
+1 for this support.
what about for 3d model rendering? we aren't needing this for machine learning.
In that case getting a GPU instance like P2, G3 etc might help? Amazon won't be providing GPUs any time soon in fargate I believe.
Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed 2.2GHz- 2.3GHz for us and not capable of running CPU/GPU critical applications.
Fargate does not support GPU and we can expect nearly in future.
In Closing Fargate helped us solve a lot of problems related to real-time processing, including the reduction of operational overhead, for this dynamic environment. We expect it to continue to grow and mature as a service. Some features we would like to see in the near future include GPU support for our GPU-based AI Engines and the ability to cache container images that are larger for quicker “warm” launch times. https://aws.amazon.com/blogs/architecture/building-real-time-ai-with-aws-fargate/
FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable.
FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable.
Sounds much more like something that sagemaker would do.
What is the status of this? I'm very interested in CUDA support in Fargate tasks.
I want to use GPU-optimised faiss training algorithms on fargate. I'm not training or running a model, I'm just training an HNSW index on faiss.
I have a slightly different use case in that it doesn't involve AI/ML at all. I need to provide my data science team with GPUs in a serverless context for massive calculations that run better on GPUs than CPUs. They run ad hoc containers in an ad hoc manner, so Fargate makes the most sense in enabling them to ship their containers and perform whatever they need instead of needing to max out their local machine. No other AWS service meets this need without requiring extra operational help which is what we are trying to avoid to allow the team to retain ownership over their work.
We would like to be able to use an on-demand GPU with headless Chromium for scheduling jobs to render WebGL image filters implemented as shaders. Currently we are using the SwiftShader in a lambda function for this because we only need to do this a few times a day but need lower latency than an EC2 auto-scaling group. SwiftShader is very slow, however, and is not identical to running on an actual GPU, causing some image quality issues. Having GPU support in Fargate would allow us to spin up ondemand containers to service rendering jobs with overall higher performance than the current lambda solution, while keeping operational costs aligned with actual usage.
Elastic GPU support in lambdas would be amazing too :)
We have a similar use case to @Zirkonium88
We have a p3.8 large instance where we have rstudio teams and we would like to downsize the instance quite a lot to use the kubernetes launcher feature of RStudio. We are using EKS backed with Fargate to launch our jupyterlab sessions and rstudio sessions but some of our users will need GPU acceleration for prototyping.