smart_open
smart_open copied to clipboard
Support for spaces in S3 bucket paths?
Problem description
I am getting the following error when reading a file from an S3 bucket:
Invalid bucket name "xxxx:yyyy@bucket": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
The bucket path is in the format:
s3://<access_key>:<secret>@path/to/file here/filename
Note that there is space in the path. I also note that spaces are not provided in the regular expressions above. Does smart_open
support spaces in S3 bucket paths?
Steps/code to reproduce the problem
A simple test would be to create a bucket with a space in its path name and attempt to read it.
Versions
In [1]: import platform, sys, smart_open
In [2]: print(platform.platform())
Linux-5.4.0-1029-aws-x86_64-with-glibc2.10
In [3]: print("Python", sys.version)
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
In [4]: print("smart_open", smart_open.__version__)
smart_open 5.2.0
Checklist
Before you create the issue, please make sure you have:
- [x] Described the problem clearly
- [x] Provided a minimal reproducible example, including any required data
- [x] Provided the version numbers of the relevant software
Bucket naming rules states that bucket name cannot have space in it the regex is from aws side.
Path to file or object name can contain space but bucket name cannot for more info you can check here
From the shared url s3://<access_key>:<secret>@path/to/file here/filename
bucket name is path
and prefix is /to/file here/filename
@demitri can you check and confirm is it correct?
I'm trying to find the exact case where I encountered this (it was two months ago). I don't think the bucket name contained a space; I believe it was the file (or a path element?). I will try to track this down.