text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

AWS Inferentia (inf1, inf2) support

Open OrigamiDream opened this issue 2 years ago • 4 comments

Feature request

I can't find any guidance on integrating HuggingFace TGI and AWS Inferentia. I've found several documents about deployment guides for individual end-to-end models, but I don't see them for these autoregressive models like CausalLM.

Therefore, I would like to request a feature to support for AWS Inferentia.

Motivation

SageMaker is expensive and rigid, unlike Serverless. Support for inf1 and inf2 instances would reduce the cost of cloud computing.

Your contribution

N/A

OrigamiDream avatar Jul 24 '23 08:07 OrigamiDream

Thanks, we currently don't support it, because to the best of my knowledge there is no flash attention on inferentia, which is an important piece of TGI.

We have started some work internally for specialized hardware, but it's a sizeable amount of work.

Narsil avatar Jul 25 '23 10:07 Narsil

The biggest challenge with inferentia is the missing support of dynamic shapes.

philschmid avatar Jul 25 '23 11:07 philschmid

This seems an AWS Neuron issue to track dynamic shapes support: https://github.com/aws-neuron/aws-neuron-sdk/issues/564

nikitajz avatar Sep 07 '23 09:09 nikitajz

https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference Another tools from hugging face

muhammad-asn avatar Dec 12 '23 15:12 muhammad-asn

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 24 '24 01:04 github-actions[bot]