(sagemaker-model-deployment): Support AWS Inferentia instance types
Describe the feature
I'd like to be able to use AWS Inferentia for my endpoint inference containers
Use Case
Proposed Solution
No response
Other Information
No response
Acknowledgements
- [X] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
Hi @kukushking , we do have 2 samples demonstrating how to deploy models on Inferentia :
- https://github.com/aws-samples/generative-ai-cdk-constructs-samples/tree/main/samples/sagemaker_huggingface_inferentia
- https://github.com/aws-samples/generative-ai-cdk-constructs-samples/tree/main/samples/sagemaker_custom_endpoint
Is there a model in particular you are looking to deploy ?
Thanks @krokoko. Is this supported for JumpStart Foundation Models? I don't see Inf instance types here
Ah sorry just found it. Nvm, closing the issue.
Error: The instance type ml.inf1.2xlarge is not supported. Default instance type: ml.g5.2xlarge. Supported instance types: ml.g5.2xlarge, ml.g5.4xlarge, ml.g5.8xlarge, ml.g5.16xlarge.
I get this error when deploying Mistral 7B with Inf. Am I doing something wrong?
Could you please share the code snippet you are using ?
Sure, here is the link.
Thanks @kukushking , will add it to the backlog
This issue is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon. If you wish to exclude this issue from being marked as stale, add the "backlog" label.
Closing this issue as it hasn't seen activity for a while. Please add a comment @mentioning a maintainer to reopen. If you wish to exclude this issue from being marked as stale, add the "backlog" label.
Closing this issue as it hasn't seen activity for a while. Please add a comment @mentioning a maintainer to reopen. If you wish to exclude this issue from being marked as stale, add the "backlog" label.