filesystem_spec
filesystem_spec copied to clipboard
FileNotFound for s3 file with / in the key
Hello! I am trying to read a csv stored in a remote s3 bucket with a / in the name, like this: 's3://mybucket/path/7/update//nicktorba.part_00000'
When i run this code, I get FileNotFound error:
(as far as I can tell, when I run a pandas read_csv
, this is the code used to read the file)
file_obj = fsspec.open(
filepath, mode="rb", **(storage_options or {})
).open()
However, this code runs successfully:
s3_client = boto3.client('s3', aws_access_key_id=storage_options["key"], aws_secret_access_key=storage_options["secret"])
s3_object = s3_client.get_object(
Bucket="my-bucket",
Key="path/7/update//nicktorba.part_00000"
)
df = pd.read_csv(s3_object['Body'], nrows=5)
Is there any way I can update the args to the fspec.open in the first code snippet to have it successfully read the file? I'm positive it exists and my access is set up correctly because the second snippet works.
Thank you!
(side note: I am unfortunately not the one in charge of the file naming, so removing the slash isn't an option at the moment)
You suspect that it is the double slash "//" that is the problem? Do other paths in the same bucket work?
The first thing I would do, is turn on s3fs logging to see exactly what calls are being made. One way:
fsspec.utils.setup_logging(logger_name="s3fs")
@martindurant Here are the logs shown:
2023-07-11 09:22:35,439 - s3fs - DEBUG - connect -- Setting up s3fs instance
2023-07-11 09:22:35,439 - s3fs - DEBUG - Setting up s3fs instance
2023-07-11 09:22:35,509 - s3fs - DEBUG - _lsdir -- Get directory listing page for mybucket/path/7/update
2023-07-11 09:22:35,509 - s3fs - DEBUG - Get directory listing page for mybucket/path/7/update
2023-07-11 09:22:36,422 - s3fs - DEBUG - _lsdir -- Get directory listing page for mybucket/path/7/update//nicktorba.part_00000
2023-07-11 09:22:36,422 - s3fs - DEBUG - Get directory listing page for mybucket/path/7/update//nicktorba.part_00000
Under that it throws the same FileNotFoundError.
Other paths in the same bucket work as expected.
Hm, s3fs should not be calling LIST upon open, but HEAD. It can be the case that you have permissions for one but not the other, and for listing, the "/" character is indeed special. What version of s3fs are you using?
@martindurant current s3fs.__version__
is '0.4.0'
I just updated to 2023.6.0
and seems to hit the same problem
Also, I get a FileNotFound when directly calling head
on s3fs:
s3 = s3fs.S3FileSystem(
key=storage_options["key"],
secret=storage_options["secret"]
)
s3.head(filepath)
I'm not sure if that is the operation you meant when you said it should be called instead of list.
No, that's a different HEAD :|
The following test fails, so there is indeed a problem, likely in fsspec
def test_multi_slash(s3):
fn = "test/path//with/slash"
s3.pipe(fn, b"data")
files = s3.find("test/path", detail=False)
assert fn in files
with fsspec.open(f"s3://{fn}") as f: # <- fails here
assert f.read() == b"data"
Switching fsspec.open
for s3.open
makes the test pass, so this is definitely in fsspec, it must be converting "//" to "/". This find at least gives you a workaround.
Switching
fsspec.open
fors3.open
makes the test pass, so this is definitely in fsspec, it must be converting "//" to "/". This find at least gives you a workaround.
@martindurant What is the s3 object in that test?
Also, unfortunately, I'm hitting this error from pandas.read_csv
, so I can't easily replace that code. I opened an issue on pandas as well since I wasn't sure which one would be better: https://github.com/pandas-dev/pandas/issues/54070
s3
is the S3FileSystem instance. I understand that this should work directly with pandas.read_csv, but you could for the moment do:
s3 = fsspec.filesystem("s3", **storage_options)
with s3.open(path) as f:
df = pd.read_csv(f)
s3
is the S3FileSystem instance. I understand that this should work directly with pandas.read_csv, but you could for the moment do:s3 = fsspec.filesystem("s3", **storage_options) with s3.open(path) as f: df = pd.read_csv(f)
What version of fsspec are you running? I have 2022.10.0
and that code fails with FileNotFound as well
2023.6.0
Correction: my test was wrong, things do really work fine with the test server. The following public file works fine:
In [19]: with fsspec.open('s3://mymdtemp/path//with/slash', anon=True) as f:
...: print(f.read())
...:
b'hello'