sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
model.deploy to allow for auto scale configuration
Describe the feature you'd like
today we deploy a model like so:
model = SKLearn(
entry_point=script_path,
framework_version="0.20.0",
py_version="py3",
instance_type="ml.m5.2xlarge",
role=role,
sagemaker_session=sagemaker_session,
dependencies=[...],
)
predictor = model.deploy(
endpoint_name="some_name",
initial_instance_count=1,
instance_type="ml.m5.large",
predictor_cls=SKLearnPredictorJson,
)
How would this feature be used? Please describe.
When calling model.deploy it would be ideal if there was a way to set an autoscale policy (similar to how we can set initial_instance_count).
Describe alternatives you've considered
I'm still researching if I can use SKLearn class while also using boto3 to attach a policy.
Hi @allcentury, thanks for the recommendation. We'll add this to our backlog. In the meantime, you can use boto3 to attach the scaling policy:
- https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-add-code-apply.html
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/application-autoscaling.html#ApplicationAutoScaling.Client.put_scaling_policy
For those new to this like I was, here is what I had to do:
client = boto3.client('application-autoscaling')
client.register_scalable_target(
ServiceNamespace='sagemaker',
ResourceId="endpoint/" + endpoint_name + "/variant/AllTraffic",
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
MinCapacity=4,
MaxCapacity=50,
RoleARN=role,
SuspendedState={
'DynamicScalingInSuspended': False,
'DynamicScalingOutSuspended': False,
'ScheduledScalingSuspended': False
}
)
# check the target is available
client.describe_scalable_targets(
ServiceNamespace='sagemaker',
MaxResults=123,
)
client.put_scaling_policy(
PolicyName='autoscale-policy',
ServiceNamespace='sagemaker',
ResourceId="endpoint/" + endpoint_name + "/variant/AllTraffic",
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
PolicyType='TargetTrackingScaling',
TargetTrackingScalingPolicyConfiguration={
'TargetValue': 150.0,
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance',
},
'ScaleOutCooldown': 300,
'ScaleInCooldown': 300,
}
)