sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Specifying image_uri in PyTorchModel gives TypeError when running deploy

Open lc-billyfung opened this issue 3 years ago • 9 comments

Describe the bug When creating a PyTorchModel and deploying to endpoint, using a specified image_uri, the model object is has attribute self.framework_version=None. In the check for _is_mms_version this will cause an error because of running a regex search with an input of type None instead of string or byte.

To reproduce

model = PyTorchModel(model_data=model_artifact,
                   name=name_from_base('model'),
                   role=role, 
                   entry_point="torchserve-predictor.py",
                   image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.7.1-cpu-py36-ubuntu18.04",
                   )

predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', endpoint_name=endpoint_name)

Expected behavior I expect the behavior to be the same as when providing framework_version and py_version into the creation of a PyTorchModel

Screenshots or logs

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, **kwargs)
    740                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
    741 
--> 742         self._create_sagemaker_model(instance_type, accelerator_type, tags)
    743         production_variant = sagemaker.production_variant(
    744             self.name, instance_type, initial_instance_count, accelerator_type=accelerator_type

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags)
    306                 /api/latest/reference/services/sagemaker.html#SageMaker.Client.add_tags
    307         """
--> 308         container_def = self.prepare_container_def(instance_type, accelerator_type=accelerator_type)
    309 
    310         self._ensure_base_name_if_needed(container_def["Image"])

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/pytorch/model.py in prepare_container_def(self, instance_type, accelerator_type)
    237 
    238         deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
--> 239         self._upload_code(deploy_key_prefix, repack=self._is_mms_version())
    240         deploy_env = dict(self.env)
    241         deploy_env.update(self._framework_env_vars())

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/pytorch/model.py in _is_mms_version(self)
    282         """
    283         lowest_mms_version = packaging.version.Version(self._LOWEST_MMS_VERSION)
--> 284         framework_version = packaging.version.Version(self.framework_version)
    285         return framework_version >= lowest_mms_version

~/.pyenv/versions/lib/python3.6/site-packages/packaging/version.py in __init__(self, version)
    294 
    295         # Validate the version and parse it into pieces
--> 296         match = self._regex.search(version)
    297         if not match:
    298             raise InvalidVersion("Invalid version: '{0}'".format(version))

TypeError: expected string or bytes-like object

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.29.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
  • Framework version: 1.7.1
  • Python version: 3.6.12
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Thanks

lc-billyfung avatar Mar 10 '21 21:03 lc-billyfung

I was able to replicate the bug with the following system information:

SageMaker Python SDK version: 2.41.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
Framework version: 1.7.1
Python version: 3.6.12
CPU or GPU: CPU
Custom Docker image (Y/N): N

Also reported it to AWS support on the 20th of May.

purplexed avatar May 21 '21 09:05 purplexed

Affects me as well, workaround seems to be to just provide a dummy version, but an annoying bug all the same.

johann-petrak avatar Jun 21 '21 13:06 johann-petrak

Same to me!

zorrofox avatar Aug 24 '21 10:08 zorrofox

Same for me. For the huggingface predictor it actually works, but it doesn't use the image I built but rather the default one...

oborchers avatar Aug 27 '21 10:08 oborchers

Update: After figuring out how to work with the repository for sagemaker images I was able to get my problems fixed (which have been solely regarding the HuggingfaceModel not being able to load custom images or to run them: https://github.com/aws/deep-learning-containers

oborchers avatar Aug 30 '21 07:08 oborchers

dummy version, but an annoying bug all the same

Hi do you have a example for your work around?

AliNGatGeeks avatar Aug 23 '22 05:08 AliNGatGeeks

dummy version, but an annoying bug all the same

Hi do you have a example for your work around?

As far as I remember I just added the parameter framework_version="1.8.1"

I can't believe that this issue is still open. The way how AWS issues get ignored by Amazon developers is rather disappointing.

johann-petrak avatar Aug 23 '22 06:08 johann-petrak

dummy version, but an annoying bug all the same

Hi do you have a example for your work around?

As far as I remember I just added the parameter framework_version="1.8.1"

I can't believe that this issue is still open. The way how AWS issues get ignored by Amazon developers is rather disappointing.

Thanks this seems to work for me as well.... I hope they fix it soon 😄

AliNGatGeeks avatar Aug 23 '22 10:08 AliNGatGeeks

I hope they fix it soon 😄

Me looking at my inbox and laughing frenetically: No.

oborchers avatar Aug 23 '22 12:08 oborchers

Does your framework_version="1.8.1" solution definitely call the image from image_uri rather than fetching a different image via the framework_version arg?

Michael-Bar avatar Dec 02 '22 13:12 Michael-Bar

@Michael-Bar i have same question. did you solved this problem?

Does your framework_version="1.8.1" solution definitely call the image from image_uri rather than fetching a different image via the framework_version arg?

eunseoada avatar Nov 03 '23 15:11 eunseoada

Hi all,

https://github.com/aws/sagemaker-python-sdk/pull/3188 partially has addressed the problem.

Still, some ambiguity remains for the specification of Models if py_version, framework_version and image_uri are all passed.

jjerphan avatar Nov 23 '23 11:11 jjerphan

Closing as fixed by #3188

martinRenou avatar Dec 15 '23 10:12 martinRenou