filesystem_spec
filesystem_spec copied to clipboard
DirFileSystem.ls(path, detail=True) hits AssertionError with S3FileSystem
This is a similar issue to #924.
Code
fs = DirFileSystem(f'/{bucket_name}', S3FileSystem())
files = fs.ls('/')
Error
Traceback (most recent call last):
File "/Users/ppatterson/src/b2_zip_files/app.py", line 40, in <module>
files = fs.ls('/')
^^^^^^^^^^
File "/Users/ppatterson/live_read_demo/b2_zip_files/lib/python3.11/site-packages/fsspec/implementations/dirfs.py", line 253, in ls
entry["name"] = self._relpath(entry["name"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ppatterson/live_read_demo/b2_zip_files/lib/python3.11/site-packages/fsspec/implementations/dirfs.py", line 70, in _relpath
assert path.startswith(prefix)
AssertionError
The problem is that DirFileSystem.__init__ sets self.path via fs._strip_protocol(path), which returns /bucket_name. However, S3FileSystem._ls returns filenames of the form bucket_name/path/to/file.ext (no leading /). Hence, in the following code from DirFileSystem._relpath, the assertion fails:
prefix = self.path + self.fs.sep # '/bucket_name/'
assert path.startswith(prefix) # 'bucket_name/path/to/file.ext' does not start with '/bucket-name/'!
It doesn't look like it's possible to set a flag to have S3FileSystem prepend a / to paths, so I think the solution is to make the leading / optional in DirFileSystem._relpath. Something like:
def _relpath(self, path):
if isinstance(path, str):
if not self.path:
return path
# We need to account for S3FileSystem returning paths that do not
# start with a '/'
if path == self.path or (self.path.startswith(self.fs.sep) and path == self.path[1:]):
return ""
prefix = self.path + self.fs.sep
if self.path.startswith(self.fs.sep) and not path.startswith(self.fs.sep):
prefix = prefix[1:]
assert path.startswith(prefix)
return path[len(prefix) :]
return [self._relpath(_path) for _path in path]
I'll file a PR with this change.