sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
ValueError: instance_type should not be a pipeline variable in SKLearnProcessor
Describe the bug The sagemaker.sklearn.processing SKLearnProcessor object throws a value error when sagemaker.workflow.parameters.ParameterString is passed as instance_type. I have been running the exact same script, and I never had an issue previously.
To reproduce
from sagemaker.workflow.parameters import (
ParameterInteger,
ParameterString,
ParameterFloat
)
from sagemaker.sklearn.processing import SKLearnProcessor
processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
processing_instance_type = ParameterString(name="ProcessingInstanceType", default_value="ml.t3.large")
framework_version = "0.23-1"
sklearn_processor = SKLearnProcessor(
framework_version=framework_version,
instance_type=processing_instance_type,
instance_count=processing_instance_count,
base_job_name="sk_preprocess",
role=role,
)
Screenshots or logs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-9898fc9aebc0> in <module>
8 instance_count=processing_instance_count,
9 base_job_name="sk_preprocess",
---> 10 role=role,
11 )
/opt/conda/lib/python3.7/site-packages/sagemaker/sklearn/processing.py in __init__(self, framework_version, role, instance_type, instance_count, command, volume_size_in_gb, volume_kms_key, output_kms_key, max_runtime_in_seconds, base_job_name, sagemaker_session, env, tags, network_config)
89
90 image_uri = image_uris.retrieve(
---> 91 defaults.SKLEARN_NAME, region, version=framework_version, instance_type=instance_type
92 )
93
/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in retrieve(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config)
115 for name, val in args.items():
116 if is_pipeline_variable(val):
--> 117 raise ValueError("%s should not be a pipeline variable (%s)" % (name, type(val)))
118
119 if is_jumpstart_model_input(model_id, model_version):
ValueError: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.94.0
- I working on a SageMaker studio notebook
@zerualem I had this problem too but there was a new release a few min ago that fixed it (2.97.0).
@bobbywlindsey thanks for the suggestion. After upgrading to SageMaker 2.97 and now instead of throwing an error, I get a warning.
WARNING:root:instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>). The default_value of this Parameter object will be used to override it. Please remove this pipeline variable and use python primitives instead.
Hi @zerualem and @bobbywlindsey, sorry for the confusing warning message. I've opened a PR (see above) to improve the warning message.
FYI: the warning you're seeing is thrown when retrieving the image_uri
via instance_type
.
- If we do not pass in an
image_uri
to theSKLearnProcessor
, the default value ofinstance_type
(a plain string) is used to retrieveimage_uri
for processor/estimator. As for this part (retrieving theimage_uri
based oninstance_type
), it’s not able to make it parameterized unless a user directly passes in theimage_uri
as aParameterString
- On the other hand, the
instance_type
of a processor/estimator can be parameterized e.g. giving aParameterString
. This behavior still works.