sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

SageMaker adds wrong additional `/` when using `S3DataSource` with a nested structure

Open philschmid opened this issue 2 years ago • 9 comments

Describe the bug SageMaker adds wrongly / when using S3DataSource where files are stored in an nested order, see screenshot of how my s3 directory looks. image

To reproduce

  1. Have a model with a nested structure, e.g. Stable Diffusion
  2. try to deploy the model using S3DataSource, e.g. below
from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data={'S3DataSource':{'S3Uri': s3_model_uri + "/",'S3DataType': 'S3Prefix','CompressionType': 'None'}},
   role=role,                      # iam role with permissions to create an Endpoint
   transformers_version="4.34.1",  # transformers version used
   pytorch_version="1.13.1",       # pytorch version used
   py_version='py310',             # python version used
   model_server_workers=1,         # number of workers for the model server
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf2.xlarge", # AWS Inferentia Instance
    volume_size = 100
)
# ignore the "Your model is not compiled. Please compile your model before using Inferentia." warning, we already compiled our model.

Expected behavior Deployed endpoint

Screenshots or logs Error: UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-inference-neuronx-2023-11-07-14-07-46-274: Failed. Reason: error: Key of model data S3 object 's3://sagemaker-us-east-2-558105141721/neuronx/sdxl//text_encoder/model.neuron' maps to invalid local file path..

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.197.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): huggingface
  • Framework version: 3.41.1
  • Python version: py310
  • CPU or GPU: Inf2
  • Custom Docker image (Y/N): N

Additional context Add any other context about the problem here.

philschmid avatar Nov 07 '23 14:11 philschmid

What is the value of s3_model_uri on this line?

model_data={'S3DataSource':{'S3Uri': s3_model_uri + "/",'S3DataType': 'S3Prefix','CompressionType': 'None'}},

whittech1 avatar Nov 13 '23 19:11 whittech1

I tried with s3://mybucket/neuronx/sdxl/ and s3://mybucket/neuronx/sdxl. The strucutre is as shown in the image.

philschmid avatar Nov 13 '23 21:11 philschmid

Here is a full example https://github.com/philschmid/huggingface-inferentia2-samples/blob/main/stable-diffusion-xl/sagemaker-notebook.ipynb

You just need to change the "3. Upload the neuron model and inference script to Amazon S3" section and then "4. Deploy a Real-time Inference Endpoint on Amazon SageMaker"

philschmid avatar Nov 13 '23 21:11 philschmid

Hi @philschmid, I tried your repo but can not reproduce the issue. Does the instance_type matter?

Screenshot 2023-11-23 133255

trungleduc avatar Nov 23 '23 12:11 trungleduc

I don't develop the SDK but i tested with inf2.xlarge maybe there is something different.

philschmid avatar Nov 23 '23 13:11 philschmid

Could you test your code with other instance types?

trungleduc avatar Nov 23 '23 13:11 trungleduc

The error is with inf2.xlarge, the instance i want to use to deploy a model. Thats where the error appears. Why do you want to test another one?

philschmid avatar Nov 23 '23 13:11 philschmid

I want to confirm whether the issue is in the SDK logic or in another place.

trungleduc avatar Nov 23 '23 13:11 trungleduc

@trungleduc, I understand you are trying to troubleshoot the root cause of the issue, but asking me to test on other instance types doesn't seem helpful at this point. As I mentioned, the error only occurs on inf2.xlarge with the version i shared. It would be more productive to dig deeper into what specifically is failing on inf2.xlarge, where this / gets added.

philschmid avatar Nov 23 '23 13:11 philschmid