sshfs icon indicating copy to clipboard operation
sshfs copied to clipboard

Question regarding call of stat() for parent dir

Open tfelbr opened this issue 1 year ago • 2 comments

Hello, while using sshfs in fsspec.open_files(), I discovered that stat() is called for the parent directory of the wanted files, even if it is already clear that this must be a directory. While this is most certainly not an issue for most cases, the sftp server I have to use behaves somewhat strange regarding this, as I get a permission error when trying to call stat() on these directories.

When using the default sftp implementation from fsspec there is no issue at all, so at least for me it seems that it should be possible without a call to stat(). Is there any way to achieve this with this library as well? I really like to use it because of performance reasons compared to sftp. Thank you!

tfelbr avatar Sep 08 '23 14:09 tfelbr

Hi @Bizarious . Sounds like a bug, maybe you could pinpoint specific line in the code? If you are getting a permission error, I suppose you have a traceback for that laying around as well?

efiop avatar Sep 08 '23 15:09 efiop

Sorry for the late reply, there were some external circumstances that prevented me from responding.

At first, thank you for the answer! A bit more context would be helpful as well I think:

I'm using fsspec.open_files() with a url that looks like this one:

ssh://user:password@sftp_host/root/path/*.zip

Now it seems the filesystem calls stat on the directory path (considering the example above) despite it should not be necessary. The relevant part of the trace looks like this:

File ".../lib/python3.10/site-packages/fsspec_sync/sync.py", line 128, in fsspec_sync
    source_open_files: OpenFiles = fsspec.open_files(
  File ".../lib/python3.10/site-packages/fsspec/core.py", line 282, in open_files
    fs, fs_token, paths = get_fs_token_paths(
  File ".../lib/python3.10/site-packages/fsspec/core.py", line 641, in get_fs_token_paths
    paths = [f for f in sorted(fs.glob(paths)) if not fs.isdir(f)]
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 775, in _glob
    allpaths = await self._find(
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 841, in _find
    if withdirs and path != "" and await self._isdir(path):
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 652, in _isdir
    return (await self._info(path))["type"] == "directory"
  File ".../lib/python3.10/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File ".../lib/python3.10/site-packages/sshfs/spec.py", line 141, in _info
    attributes = await channel.stat(path)
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 4573, in stat
    return await self._handler.stat(path, flags)
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 2695, in stat
    return cast(SFTPAttrs,  await self._make_request(
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 2454, in _make_request
    result = self._packet_handlers[resptype](self, resp)
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 2470, in _process_status
    raise exc
asyncssh.sftp.SFTPPermissionDenied: Permission denied.

What I discovered using the debugger, was that fsspec splits the path in the _glob function and calls _find() on the directory, so /root/path/ in our case. find() then calls _isdir() on that path which in turn calls _info() of the ssh filesystem, which leads to a call of stat() to this directory, leading in a permission error in my case. The relevant line in sshfs would be 141 in sshfs/spec.py.

Of course we are talking about the async implementation of glob() and find(), but I compared it to the normal ones and they look mostly similar, especially the call to isdir().

I am not sure if there is anything that can be done inside the ssh implementation, but as I already mentioned the default sftp implementation does not have this problem. Please let me know what you think and if I missed anything! Thanks :)

tfelbr avatar Sep 22 '23 15:09 tfelbr