aws-cdk-examples icon indicating copy to clipboard operation
aws-cdk-examples copied to clipboard

sagemaker-hosting-multiple-models: Creating docker image for sagemaker endpoint and deploying multiple models to it

Open zaremb opened this issue 3 years ago • 1 comments
trafficstars

General Information

I have created python cdk example of how to host multiple models on the same Sagemaker endpoint. It adds to the examples repo:

  1. How to create and push docker image from local dockerfile
  2. How to upload models to S3
  3. Create Sagemaker role
  4. Create Sagemaker model from S3 location
  5. Create Sagemaker multi-model endpoint configuration
  6. Deploy endpoint

Additionally, I have added example code so apart from creating the required infrastructure, you can actually run some example models and infer on sample data.

I have already implemented the feature in Python, currently reviewing code and improving readme

Please take a look at the part of the README.md:

This sample application show how you can use AWS Cloud Development Kit (AWS CDK) to deploy multiple SageMaker models on a singular endpoint. With Amazon SageMaker multi-model endpoints, customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, a traditional endpoint is still the best choice.

Proposed Solution

Please take a look at the part of the README.md:

At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When the endpoint is invoked for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.

Language

Python

Other information

I have already implemented the feature in Python, currently reviewing code and improving readme

Acknowledge

  • [X] I may be able to implement this feature request
  • [ ] This feature might incur a breaking change

zaremb avatar Feb 02 '22 11:02 zaremb

Thank you for the contribution. I see the PR has stalled out -- I'll make sure it gets some attention.

indrora avatar Jun 10 '22 22:06 indrora

Closing as it looks like it got merged.

kaiz-io avatar Sep 24 '24 21:09 kaiz-io

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Sep 24 '24 21:09 github-actions[bot]