s3fs
s3fs copied to clipboard
opendir() fails on same path that listdir() works on
Not sure why, but opendir()
apparently only works on the root directory. If you try to use it with a subdirectory that exists, you get a ResourceNotFound error.
>>> s3 = fs.open_fs("s3://my-bucket")
# The directory clearly exists...
>>> s3.listdir("/path/to/directory")
['foo.txt', 'bar.txt']
# ... but sad times if you try to open it
>>> root = s3.opendir("/path/to/directory")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs/base.py", line 1207, in opendir
if not self.getbasic(path).is_dir:
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs/base.py", line 1525, in getbasic
return self.getinfo(path, namespaces=["basic"])
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 441, in getinfo
raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '/path/to/directory' not found
I've tried this both with the leading and trailing slash and it still breaks.
If I try this...
>>> s3 = fs.open_fs("s3://my-bucket/path/to/directory")
>>> s3.listdir("/")
['foo.txt', 'bar.txt']
# I can open files
>>> with s3.open('foo.txt', 'rb') as fd:
... print(len(fd.read()))
36256176
So far so good. However, if I try using filterdir()
it breaks even though I was just able to open a .txt
file:
>>> list(s3.filterdir("*.txt"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 695, in scandir
info = self.getinfo(path)
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 451, in getinfo
obj = self._get_object(path, _key)
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 351, in _get_object
return obj
File "/Users/dargueta/.pyenv/versions/3.7.6/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 183, in s3errors
raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '*.txt' not found
I suspect something's wrong with the way SubFS is getting created.
(Duplicate of #8 but that was closed a long time ago with no resolution)
I think this is probably related to https://fs-s3fs.readthedocs.io/en/latest/#limitations , and to previosly reported issues such as #62 and others. It can be tricky to use S3FS on buckets where files were previously created e.g. with boto3, because the presence e.g. of file “foo/bar” does not imply the existence of a directory object "foo/" (which is an empty object with key "foo/") . S3FS instead requires the presence of such objects for some operations. if you try with s3.makedir("/path/to/directory") than listdir should work.
As an alternative you could look at #60 and at https://github.com/mrk-its/s3fs Or write a script crawling your bucket and creating all missing directories, i guess.
So I get the S3 empty object thing but I've been doing some thinking and I think, with some finagling, it may be possible to simulate nonexistent directories by using prefix-based searching instead of relying on empty objects as "sentinels" for lack of a better term. This would complicate somethings like stat
since it would have to look for both that empty object and, failing that, see if there are any keys with a matching prefix, but I think that can be done without too much of a performance impact.