sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

An easier equivalent to the removed update_endpoint argument

Open athewsey opened this issue 4 years ago • 12 comments

Describe the feature you'd like

A direct/simple way to update an existing endpoint to a new model version (created e.g. by Model() constructor or Estimator.fit()).

Per the SDK v2 migration doc, Estimator.deploy() and Model.deploy() have had their update_endpoint argument removed and raise an error when called with an existing endpoint name. Users are advised to use Predictor.update_endpoint() instead.

The problem is the update_endpoint() method takes an existing SageMaker Model name as parameter and, per #1094, I'm not aware of an easy/SDK way to register a Model in the API given a Model object or a trained Estimator.

How would this feature be used? Please describe.

When a user has re-trained an Estimator or created a new Model object in the SDK, they'll be able to easily update an existing endpoint - like they would have done in v1 with Model.deploy(..., update_endpoint=True).

Describe alternatives you've considered

The implementation could maybe proceed as:

  • Re-instate the update_endpoint parameter to enable the old one-line flow
  • Add a method on Model (and maybe Estimator too?) to register the Model in the SageMaker API.
  • Something else?

Additional context

As used in, for example, the amazon-sagemaker-analyze-model-predictions sample.

It'd be great to know if I'm just missing an easy way to use Predictor.update_endpoint() for this!

athewsey avatar Sep 23 '20 01:09 athewsey

An example flow I got working for now, which uses private/internal functions and repeats the instance type way too much:

sagemaker_model._init_sagemaker_session_if_does_not_exist('ml.m5.xlarge')
sagemaker_model._create_sagemaker_model('ml.m5.xlarge')
predictor.update_endpoint(
    model_name=sagemaker_model.name,
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',
)

...Speaking of which, it seems weird to me that initial_instance_count and instance_type are required params on the predictor call when the model_name is specified, but not otherwise? Can't it just default to the existing endpoint instance params as it would in the case where model_name wasn't changed?

athewsey avatar Sep 23 '20 02:09 athewsey

An example flow I got working for now, which uses private/internal functions and repeats the instance type way too much:

sagemaker_model._init_sagemaker_session_if_does_not_exist('ml.m5.xlarge')
sagemaker_model._create_sagemaker_model('ml.m5.xlarge')
predictor.update_endpoint(
    model_name=sagemaker_model.name,
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',
)

...Speaking of which, it seems weird to me that initial_instance_count and instance_type are required params on the predictor call when the model_name is specified, but not otherwise? Can't it just default to the existing endpoint instance params as it would in the case where model_name wasn't changed?

I am also encountering a similar issue. However, I am actually having a hard time finding the model name when using the sdk. How have you gone about doing this? I was not able to locate a place where the estimator, or associated training jobs keep track of the created model at all unfortunately, but I may just be missing it.

kenanzh avatar Sep 24 '20 21:09 kenanzh

@kenanzh when you call Estimator.deploy() it actually wraps around creating 3 things in the back-end, that you can see in the SageMaker Console: Model, Endpoint Configuration, and Endpoint.

In my example I was explicitly creating an SDK 'Model' object. I think you should be able to get the equivalent of my sagemaker_model by calling Estimator.create_model(...).

Note that creating a PyTorchModel(or equivalent for other frameworks) in the SDK does not actually register it in the SageMaker API, which is why I called the internal sagemaker_model._create_sagemaker_model('ml.m5.xlarge') above. Creating the "real model" in the SageMaker API requires knowing the instance type (because most frameworks have different images for GPU vs CPU), so normally it happens when you call Model.transformer() or Model.deploy(). The sagemaker_model.name property will be empty until the "real model" has been created in the API.

athewsey avatar Sep 25 '20 02:09 athewsey

@athewsey Thanks for using our product and the suggestion. We will have a discussion about this feature request.

icywang86rui avatar Sep 30 '20 18:09 icywang86rui

This is a missing feature, and a very important one; please update.

kuirensu avatar Oct 20 '20 03:10 kuirensu

Any update/workaround regarding this one?

amitm-sundaysky avatar Dec 02 '20 15:12 amitm-sundaysky

+1. Whatever good reason there is behind removing the update_endpoint arg, the migration doc reads like "go figure out yourself", which does not really help with the transition from v1 to v2. I would expect at least some example about how do perform the same function in v2, or revert this change if it is not really necessary.

bill10 avatar Dec 28 '20 18:12 bill10

Here is how to create a model and update an existing endpoint

Create model using sagemaker session

You can create the model using sagemaker session. Depending on whether it is BYO or an existing training job chose of the one method to create the container definition.

BYO- Create model

The model is trained outside sagemaker, e.g. a Pretrained Model

Step 0 - Prerequisite FOR BYO: Package your model correctly : Note: Make sure the model_data_url is packaged correctly according to create-the-directory-structure-for-your-model-files and upload to s3. Also thanks to Joao Moura, add the ENV variable so that sagemaker knows the entry point

import sagemaker, datetime

# Retrieve the inference image uri for a GPU instance for pytorch 1.4.0
image_uri = sagemaker.image_uris.retrieve("pytorch", "us-east-2", version="1.4.0", py_version="py3", 
                              instance_type="ml.p3.2xlarge", accelerator_type=None, image_scope="inference",
                              container_version=None, distribution=None, base_framework_version=None)

# Define container def
container_def = sagemaker.session.container_def(image_uri, model_data_url,  env={'SAGEMAKER_PROGRAM': my_inference_entry_point})

# Create model
new_model_name = "my-new-model-{}".format( datetime.datetime.now().strftime("%Y%m%d%H%M%S"))
sm_session = sagemaker.session.Session()
sm_session.create_model(new_model_name, role, container_def)

Existing training job - Create model

import sagemaker, datetime
from sagemaker.pytorch.estimator  import PyTorch

# Retrieve the inference image uri for a GPU instance for pytorch 1.4.0
image_uri = sagemaker.image_uris.retrieve("pytorch", "us-east-2", version="1.4.0", py_version="py3", 
                              instance_type="ml.p3.2xlarge", accelerator_type=None, image_scope="inference",
                              container_version=None, distribution=None, base_framework_version=None)

# Attach to existing training job
estimator = PyTorch.attach(training_job_name )

# Construct PyTorch model object
new_model_name = "my-new-model-{}".format( datetime.datetime.now().strftime("%Y%m%d%H%M%S"))
model = estimator.create_model(name=new_model_name, entry_point=my_inference_entry_point,image_uri=image_uri)

# Prepare container def, so you package the entry point file and the model 
container_def = model.prepare_container_def()

# Create model - in SageMaker
sm_session = sagemaker.session.Session()
sm_session.create_model(new_model_name, role, container_def)

Update Endpoint

Once the model is created, update the existing endpoint

import sagemaker

predictor = sagemaker.pytorch.model.PyTorchPredictor(existing_endpoint_name)
predictor.update_endpoint(initial_instance_count=1, instance_type="ml.p3.2xlarge", model_name= new_model_name)

elangovana avatar May 13 '21 19:05 elangovana

Has anyone found a Tensorflow solution to this problem?

dan9059021 avatar Dec 15 '21 17:12 dan9059021

Has anyone found a solution for this ?

ghost avatar Apr 19 '22 09:04 ghost

I have found a solution to this by re-coding the whole deployment script using the boto3 SDK rather than the sagemaker SDK. Here's a full code in the stackoverflow accepted answer:

https://stackoverflow.com/questions/73728499/how-to-update-an-existing-model-in-aws-sagemaker-2-0/73825605#73825605

You can indeed re-write the code and give an entry point as well as as a source directory of other code dependencies using the boto3 sagemaker. The documentation just doesn't state it unfortunately.

basselmasri avatar Sep 26 '22 08:09 basselmasri

Does this request still exist?

liujiaorr avatar Apr 29 '24 19:04 liujiaorr