sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

sagemaker.local bug when inputs are binary files

Open vojavocni opened this issue 11 months ago • 1 comments

Describe the bug Hello, I think I encountered a bug in sagemaker.local. I'm trying to test a batch transform with images as input, but I get the following error even before I reach the input_fn of my custom inference script

│   345 │   │   for element in self.splitter.split(file):
│ ❱ 346 │   │   │   if _payload_size_within_limit(buffer + element, size):
│   347 │   │   │   │   buffer += element
│   348 │   │   │   else:
│   349 │   │   │   │   tmp = buffer
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "bytes") to str

I am not using a splitter (splitter type is None), as it's not necessary on images.

I believe the problem is in line 343 of MultRecordStrategy class https://github.com/aws/sagemaker-python-sdk/blob/ae3cc1c44244d79ed6187fd26889449672c55af3/src/sagemaker/local/data.py#L326-L352

We can see that the buffer variable is assumed to be a string, which means it's assumed that the file variable would not refer to a binary object, which should be possible.

To reproduce Just run local batch transform with a single image as input. The model doesn't really matter I think, it will fail before any prediction or interaction between data and the model is made.

Expected behavior I would expect the buffer to be sensitive to weather the file is a string like json or csv, or a binary type like png.

Screenshots or logs See above.

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.237.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch, custom inference and model
  • Framework version: 2.5.1
  • Python version: 3.11
  • CPU or GPU: Both
  • Custom Docker image (Y/N): Y, extending the pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker image

Additional context Add any other context about the problem here.

vojavocni avatar Jan 17 '25 15:01 vojavocni

I have a fix in my local environment after which the whole batch transform job works:

buffer = b"" if isinstance(next(self.splitter.split(file)), bytes) else ""

So I can confirm this is the issue. I can make a small PR if this seems ok?

vojavocni avatar Jan 17 '25 15:01 vojavocni