sagemaker-inference-toolkit
sagemaker-inference-toolkit copied to clipboard
End-to-End example for *NON* multi model deployment
What did you find confusing? Please describe.
Based on the documentation the complete example multi_model_bring_your_own, it seems that sagemaker-inference-toolkit
is only for multi-model requirements. But I have also seen links for sagemaker_pytorch_serving_container which suggests that that is not the case.
There is no clear instruction in the documentation or an end-to-end example link indicating that it can be used for single model hosting scenarios as well.
Describe how documentation can be improved You can provide one more end-to-end example for single model hosting, along with some points in favor of using this python package, instead of designing our own docker containers from scratch.
Additional context
Hi @nvs-abhilash, Thanks for bringing this to our attention, we are planning to introduce an example of sagemaker-inference-toolkit
+ multi-model-server
.
Meanwhile, you can refer to these instructions to implement your own handler service:
custom handler service: https://github.com/awslabs/multi-model-server/blob/master/docs/custom_service.md#example-custom-service-file
custom inference handler: https://github.com/aws/sagemaker-inference-toolkit/blob/master/README.md#implementation-steps
Hi, my confusion is not how to use it with Multi Model Server, I think it is very well documented in the MMS SageMaker Example.
What we don't have is an example for "Bring Your Own Model" using sagemaker-inferece-toolkit
.
As we don't have that, it looks like sagemaker-inferece-toolkit
can only work with multi-model-server
.
Maybe a sister example of scikit_bring_your_own, but using sagemaker-inferece-toolkit
.
Let me know if that makes sense.
I am looking for the same information, and agree that such an example will be useful.
I've been struggling for the same issue. Any update on this?
Hi,
I've also been looking for examples of applying sagemaker-inference-toolkit
for a single model deployment not using PyTorch. The best example which might also benefit others is the following repo from a AWS workshop: https://github.com/awslabs/amazon-sagemaker-mlops-workshop.
This is a particular issue when trying to host the model on the new serverless variant which does not support the multi-model hosting, see excluded features. Trying to host the model with my own flask API now.
Hi, I have the same issue now. It has been a few years. Any updates?