smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

Support for spaces in S3 bucket paths?

Open demitri opened this issue 3 years ago • 2 comments

Problem description

I am getting the following error when reading a file from an S3 bucket:

Invalid bucket name "xxxx:yyyy@bucket": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

The bucket path is in the format:

s3://<access_key>:<secret>@path/to/file here/filename

Note that there is space in the path. I also note that spaces are not provided in the regular expressions above. Does smart_open support spaces in S3 bucket paths?

Steps/code to reproduce the problem

A simple test would be to create a bucket with a space in its path name and attempt to read it.

Versions

In [1]: import platform, sys, smart_open

In [2]: print(platform.platform())
Linux-5.4.0-1029-aws-x86_64-with-glibc2.10

In [3]: print("Python", sys.version)
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]

In [4]: print("smart_open", smart_open.__version__)
smart_open 5.2.0

Checklist

Before you create the issue, please make sure you have:

  • [x] Described the problem clearly
  • [x] Provided a minimal reproducible example, including any required data
  • [x] Provided the version numbers of the relevant software

demitri avatar Aug 19 '21 19:08 demitri

Bucket naming rules states that bucket name cannot have space in it the regex is from aws side.

Path to file or object name can contain space but bucket name cannot for more info you can check here

From the shared url s3://<access_key>:<secret>@path/to/file here/filename bucket name is path and prefix is /to/file here/filename @demitri can you check and confirm is it correct?

ChandanChainani avatar Oct 27 '21 04:10 ChandanChainani

I'm trying to find the exact case where I encountered this (it was two months ago). I don't think the bucket name contained a space; I believe it was the file (or a path element?). I will try to track this down.

demitri avatar Oct 28 '21 23:10 demitri