sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

sagemaker.session.download_data() is unable to download S3 content.

Open anrikus opened this issue 1 year ago • 0 comments

Describe the bug

If nested objects from a S3 bucket is downloaded to a temp file using boto3.client.download_file(), and then re-uploaded to another S3 bucket using boto3.client.upload_file(), sagemaker.session.download_data() is unable to download the re-uploaded objects, failing with error: [Errno 21] Is a directory: ....

To reproduce

  1. Create a source S3 bucket: source-s3-bucket and destination-s3-bucket, assign it the proper permissions.
  2. Create a simple nested content in the source-s3-bucket, such as: dir1/file1
  3. Use the following code to download and then re-upload the content:
import boto3
import sagemaker

s3_resource = boto3.resource("s3")
s3_client = boto3.client("s3", region_name="us-west-2")
sagemaker_session = sagemaker.Session()

# Upload local file to a S3 using s3_client in faulty order
source_bucket = ##source-s3-bucket##
dest_bucket = ##destination-s3-bucket##

for obj in s3_resource.Bucket(source_bucket).objects.filter(Prefix=""):
    s3_client.download_file(source_bucket, obj.key, "temp.file")
    s3_client.upload_file("temp.file", dest_bucket, obj.key)

sagemaker_session.download_data(path=".", bucket= ##destination-s3-bucket##, key_prefix="")

Expected behavior

sagemaker_session.download_data() should be able to download the content from destination-s3-bucket to local directory.

aws cli is able to do it using:

!aws s3 cp --recursive ##destination-s3-bucket## "./"

Screenshots or logs

image image

System information

  • SageMaker Python SDK version: sagemaker 2.109.0 awscli 1.25.72 boto3 1.24.71
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: 3.8.10
  • CPU or GPU: Both
  • Custom Docker image (Y/N): N.

Additional context

Should be reproducible in Sagemaker Studio using PyTorch 1.10 Python 3.8 CPU and PyTorch 1.10 Python 3.8 GPU using the provided library versions or upgrading them to the latest version, as sagemaker->2.109.0, awscli->1.25.72, boto3->1.24.71, as of the writing, using !pip install -U sagemaker boto3 awscli.

anrikus avatar Sep 13 '22 00:09 anrikus