sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Local deployment is not working on Windows 10

Open ViktorStepanukCN opened this issue 5 years ago • 5 comments

Describe the bug Trained model artifacts are not downloaded from S3 during deploy on Windows 10.

To reproduce Demonstrated on example from https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_script_mode_training_and_serving/tensorflow_script_mode_training_and_serving.ipynb

import sagemaker
from sagemaker import get_execution_role
from sagemaker.tensorflow import TensorFlow

sagemaker_session = sagemaker.Session()
role = get_execution_role()
region = sagemaker_session.boto_session.region_name

training_data_uri = 's3://sagemaker-sample-data-{}/tensorflow/mnist'.format(region)

mnist_estimator2 = TensorFlow(entry_point='mnist2.py',
                             role=role,
                             train_instance_count=1,
                             train_instance_type='local',
                             framework_version='2.0.0',
                             py_version='py3')

mnist_estimator2.fit(training_data_uri)

predictor2 = mnist_estimator2.deploy(initial_instance_count=1, instance_type='local')

Expected behavior Model artifacts should be downloaded from S3 and accessible to serving container.

Screenshots or logs image

System information A description of your system. Please provide:

  • SageMaker Python SDK version:sagemaker==1.50.10.post0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans):TensorFlow
  • Framework version:2.0
  • Python version:3.7.6
  • CPU or GPU:CPU
  • Custom Docker image (Y/N):N

Additional context Problems comes from obtaining S3ModelArtifacts path in \sagemaker\local\image.py in method def retrieve_artifacts(self, compose_data, output_data_config, job_name) is artifact path returned using simple return os.path.join(output_data, "model.tar.gz") if this is called on Windows it produces something like: ../tensorflow-training-2020-02-19-15-57-14-207\model.tar.gz when Sagemaker tries to download artifacts from S3 afterwards in \sagemaker\utils.py using method def download_folder(bucket_name, prefix, target, sagemaker_session): it fails to retrieve files calling bucket.objects.filter(Prefix=prefix) because of the \ in front of model.tar.gz

ViktorStepanukCN avatar Feb 19 '20 16:02 ViktorStepanukCN

Thank you for submitting a detailed bug report. It appears this issue was fixed in https://github.com/aws/sagemaker-python-sdk/pull/1302, which was released in v1.50.14.

Please try updating your version of the SageMaker Python SDK.

ajaykarpur avatar Feb 25 '20 18:02 ajaykarpur

Hi, thank you for reaction. I tried version 1.50.16.dev0 and the problem still remains. It looks like that metioned fix was for the similiar problem but in different place.

Code of the method for retrieving model artifacts ( retrieve_artifacts in sagemaker-python-sdk-master/src/sagemaker/local/image.py ) is still using return os.path.join(output_data, "model.tar.gz")

ViktorStepanukCN avatar Feb 26 '20 08:02 ViktorStepanukCN

Windows Support for Local Mode has been Experimental and unfortunately has never been fully supported or tested.

Marking with a feature request label.

nadiaya avatar Feb 26 '20 18:02 nadiaya

Could you please provide more details about your use case for using local mode? That would help us a lot when prioritizing the roadmap.

Thank you!

nadiaya avatar Feb 26 '20 21:02 nadiaya

Could you please provide more details about your use case for using local mode? That would help us a lot when prioritizing the roadmap.

Thank you!

Hi, I thought I'd bring some more notice to this.

The use case for me is for local testing. Currently, the only way to test is by using instances, storage, etc. It's noted in the blog here which are also perfectly valid use cases for windows machines. It can take some time just to debug as well as an extra expense.

LukeHankey avatar Nov 08 '21 15:11 LukeHankey