sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Cannot deploy Huggingface model onto serverless endpoint
Describe the bug When trying to deploy my Huggingface model through:
predictor = huggingface_model.deploy(
endpoint_name = endpoint_name,
serverless_inference_config = {
"MemorySizeInMB": 1024,
"MaxConcurrency": 2,
}
)
I get the following error:
File "/XXX/lib/python3.9/site-packages/sagemaker/huggingface/model.py", line 271, in deploy
if not self.image_uri and instance_type.startswith("ml.inf"):
AttributeError: 'NoneType' object has no attribute 'startswith'
I think this is because Huggingface deploy currently assumes that an instance type is given (not ready for it being serverless). In the serverless case instance_type is None, but it uses string methods on instance_type here:
https://github.com/aws/sagemaker-python-sdk/blob/f3c2d7ec56fb63878da978c1e58caf3771999218/src/sagemaker/huggingface/model.py#L271
Maybe a simple not is_serverless and at the start of this if statement would fix this? Or am I being dense?
Thanks!
Hi,
You need to provide an instance_type, as the default value is None, that's why you are getting the error AttributeError: 'NoneType' object has no attribute 'startswith'.
See the deploy method signature in the doc here.
deploy(initial_instance_count=None, instance_type=None, serializer=None, deserializer=None, accelerator_type=None, endpoint_name=None, tags=None, kms_key=None, wait=True, data_capture_config=None, async_inference_config=None, serverless_inference_config=None, **kwargs)
You can find the list of available instance types here: https://aws.amazon.com/sagemaker/pricing/
But if I want to make a serverless endpoint (as described here - https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-sagemaker-serverless-inference/), then I cannot supply an instance type, as this option explicitly has no defined instance.
In the AWS tutorial provided for making a serverless endpoint (https://aws.amazon.com/blogs/machine-learning/deploying-ml-models-using-sagemaker-serverless-inference-preview/), under the heading "Endpoint configuration creation", there is no instance_type required:
endpoint_config_response = client.create_endpoint_config(
EndpointConfigName=xgboost_epc_name,
ProductionVariants=[
{
"VariantName": "byoVariant",
"ModelName": model_name,
"ServerlessConfig": {
"MemorySizeInMB": 4096,
"MaxConcurrency": 1,
},
},
],
)
I should be able to do this through HuggingfaceModel.deploy() too, but it seems that the API hasn't been updated to support this (relatively new) feature yet.
Thanks for clarifying that you want to deploy in serverless mode.
In your case, you need to provide an image_uri. See how the image_uri is retreived in section "Setup and training" and used in section "Model creation" from this tutorial https://aws.amazon.com/blogs/machine-learning/deploying-ml-models-using-sagemaker-serverless-inference-preview/.
Here's an example of retrieving the uri for huggingface with tensorflow as base framework.
import sagemaker
sagemaker.image_uris.retrieve(
framework="huggingface",
region="eu-west-1",
version="4.6.1",
py_version="py37",
image_scope='inference',
instance_type="ml.m5.2xlarge",
base_framework_version='tensorflow2.4.1'
)
# gives a uri such as: '763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-tensorflow-inference:2.4.1-transformers4.6.1-cpu-py37-ubuntu18.04'
@Peter-Devine does this solve your issue?
@Peter-Devine We are closing this issue due to inactivity. Please feel free to reopen the issue if suggested solution doesn't solve the issue for you. Thanks!