s3fs icon indicating copy to clipboard operation
s3fs copied to clipboard

opendir() fails on same path that listdir() works on

Open dargueta opened this issue 3 years ago • 2 comments

Not sure why, but opendir() apparently only works on the root directory. If you try to use it with a subdirectory that exists, you get a ResourceNotFound error.

>>> s3 = fs.open_fs("s3://my-bucket")

# The directory clearly exists...
>>> s3.listdir("/path/to/directory")
['foo.txt', 'bar.txt']

# ... but sad times if you try to open it
>>> root = s3.opendir("/path/to/directory")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs/base.py", line 1207, in opendir
    if not self.getbasic(path).is_dir:
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs/base.py", line 1525, in getbasic
    return self.getinfo(path, namespaces=["basic"])
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 441, in getinfo
    raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '/path/to/directory' not found

I've tried this both with the leading and trailing slash and it still breaks.

If I try this...

>>> s3 = fs.open_fs("s3://my-bucket/path/to/directory")

>>> s3.listdir("/")
['foo.txt', 'bar.txt']

# I can open files
>>> with s3.open('foo.txt', 'rb') as fd:
...     print(len(fd.read()))
36256176

So far so good. However, if I try using filterdir() it breaks even though I was just able to open a .txt file:

>>> list(s3.filterdir("*.txt"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 695, in scandir
    info = self.getinfo(path)
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 451, in getinfo
    obj = self._get_object(path, _key)
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 351, in _get_object
    return obj
  File "/Users/dargueta/.pyenv/versions/3.7.6/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 183, in s3errors
    raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '*.txt' not found

I suspect something's wrong with the way SubFS is getting created.

(Duplicate of #8 but that was closed a long time ago with no resolution)

dargueta avatar Nov 03 '20 19:11 dargueta

I think this is probably related to https://fs-s3fs.readthedocs.io/en/latest/#limitations , and to previosly reported issues such as #62 and others. It can be tricky to use S3FS on buckets where files were previously created e.g. with boto3, because the presence e.g. of file “foo/bar” does not imply the existence of a directory object "foo/" (which is an empty object with key "foo/") . S3FS instead requires the presence of such objects for some operations. if you try with s3.makedir("/path/to/directory") than listdir should work.

As an alternative you could look at #60 and at https://github.com/mrk-its/s3fs Or write a script crawling your bucket and creating all missing directories, i guess.

desmoteo avatar Nov 07 '20 14:11 desmoteo

So I get the S3 empty object thing but I've been doing some thinking and I think, with some finagling, it may be possible to simulate nonexistent directories by using prefix-based searching instead of relying on empty objects as "sentinels" for lack of a better term. This would complicate somethings like stat since it would have to look for both that empty object and, failing that, see if there are any keys with a matching prefix, but I think that can be done without too much of a performance impact.

dargueta avatar Feb 26 '21 20:02 dargueta