filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

DirFileSystem.ls(path, detail=True) hits AssertionError with S3FileSystem

Open metadaddy opened this issue 1 year ago • 0 comments

This is a similar issue to #924.

Code

fs = DirFileSystem(f'/{bucket_name}', S3FileSystem())

files = fs.ls('/')

Error

Traceback (most recent call last):
  File "/Users/ppatterson/src/b2_zip_files/app.py", line 40, in <module>
    files = fs.ls('/')
            ^^^^^^^^^^
  File "/Users/ppatterson/live_read_demo/b2_zip_files/lib/python3.11/site-packages/fsspec/implementations/dirfs.py", line 253, in ls
    entry["name"] = self._relpath(entry["name"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ppatterson/live_read_demo/b2_zip_files/lib/python3.11/site-packages/fsspec/implementations/dirfs.py", line 70, in _relpath
    assert path.startswith(prefix)
AssertionError

The problem is that DirFileSystem.__init__ sets self.path via fs._strip_protocol(path), which returns /bucket_name. However, S3FileSystem._ls returns filenames of the form bucket_name/path/to/file.ext (no leading /). Hence, in the following code from DirFileSystem._relpath, the assertion fails:

prefix = self.path + self.fs.sep # '/bucket_name/'
assert path.startswith(prefix) # 'bucket_name/path/to/file.ext' does not start with '/bucket-name/'!

It doesn't look like it's possible to set a flag to have S3FileSystem prepend a / to paths, so I think the solution is to make the leading / optional in DirFileSystem._relpath. Something like:

    def _relpath(self, path):
        if isinstance(path, str):
            if not self.path:
                return path
            # We need to account for S3FileSystem returning paths that do not
            # start with a '/'
            if path == self.path or (self.path.startswith(self.fs.sep) and path == self.path[1:]):
                return ""
            prefix = self.path + self.fs.sep
            if self.path.startswith(self.fs.sep) and not path.startswith(self.fs.sep):
                prefix = prefix[1:]
            assert path.startswith(prefix)
            return path[len(prefix) :]
        return [self._relpath(_path) for _path in path]

I'll file a PR with this change.

metadaddy avatar Jun 27 '24 20:06 metadaddy