sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

ValueError: instance_type should not be a pipeline variable in SKLearnProcessor

Open zerualem opened this issue 2 years ago • 3 comments

Describe the bug The sagemaker.sklearn.processing SKLearnProcessor object throws a value error when sagemaker.workflow.parameters.ParameterString is passed as instance_type. I have been running the exact same script, and I never had an issue previously.

To reproduce

from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat
)
from sagemaker.sklearn.processing import SKLearnProcessor

processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
processing_instance_type = ParameterString(name="ProcessingInstanceType", default_value="ml.t3.large")
framework_version = "0.23-1"

sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    base_job_name="sk_preprocess",
    role=role,
)

Screenshots or logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9898fc9aebc0> in <module>
      8     instance_count=processing_instance_count,
      9     base_job_name="sk_preprocess",
---> 10     role=role,
     11 )

/opt/conda/lib/python3.7/site-packages/sagemaker/sklearn/processing.py in __init__(self, framework_version, role, instance_type, instance_count, command, volume_size_in_gb, volume_kms_key, output_kms_key, max_runtime_in_seconds, base_job_name, sagemaker_session, env, tags, network_config)
     89 
     90         image_uri = image_uris.retrieve(
---> 91             defaults.SKLEARN_NAME, region, version=framework_version, instance_type=instance_type
     92         )
     93 

/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in retrieve(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config)
    115     for name, val in args.items():
    116         if is_pipeline_variable(val):
--> 117             raise ValueError("%s should not be a pipeline variable (%s)" % (name, type(val)))
    118 
    119     if is_jumpstart_model_input(model_id, model_version):

ValueError: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.94.0
  • I working on a SageMaker studio notebook

zerualem avatar Jun 28 '22 16:06 zerualem

@zerualem I had this problem too but there was a new release a few min ago that fixed it (2.97.0).

bobbywlindsey avatar Jun 28 '22 21:06 bobbywlindsey

@bobbywlindsey thanks for the suggestion. After upgrading to SageMaker 2.97 and now instead of throwing an error, I get a warning.

WARNING:root:instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>). The default_value of this Parameter object will be used to override it. Please remove this pipeline variable and use python primitives instead.

zerualem avatar Jun 30 '22 20:06 zerualem

Hi @zerualem and @bobbywlindsey, sorry for the confusing warning message. I've opened a PR (see above) to improve the warning message.

FYI: the warning you're seeing is thrown when retrieving the image_uri via instance_type.

  • If we do not pass in an image_uri to the SKLearnProcessor, the default value of instance_type (a plain string) is used to retrieve image_uri for processor/estimator. As for this part (retrieving the image_uri based on instance_type), it’s not able to make it parameterized unless a user directly passes in the image_uri as a ParameterString
  • On the other hand, the instance_type of a processor/estimator can be parameterized e.g. giving a ParameterString. This behavior still works.

qidewenwhen avatar Jul 20 '22 19:07 qidewenwhen