filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Ability to check if a filesystem implements an interface

Open lobis opened this issue 1 year ago • 8 comments

Apologies if this is currently possible, I couldn't find a clean way to achieve this.

I think it would be useful to have a way to check if a given filesystem implements a given interface (e.g. is cat_file implemented for the ssh protocol?).

Currently, when calling an interface that is not implemented, a NotImplementedError will be thrown since this method is defined in the base class. Checking if a method with the requested name is present in the class will always return True since it's defined in the base class (but not implemented). I guess there are some workarounds such as using the inspect module but

My use case is a library that uses fsspec for file access and relies on specific interfaces. If the interface is not present for the filesystem it would be nice to throw an exception pointing the user to the specific fsspec protocol and method that needs to be implemented (so they can open an issue to request it, if it makes sense).

This can currently be achieved wrapping the code in a try/except block and checking for NotImplementedError but personally I don't find this solution very clean. In order to avoid wrapping the code one could also add this try/except block on initialization to check if the method is implemented (by calling it with trivial arguments) but this may introduce some delay in the case the method is implemented, which would be most cases so I also don't like this.

There already exists a nice way to check if a protocol is available in fsspec using the registry, I guess this could be an extension to this feature, where the individual filesystems register what interfaces they support.

lobis avatar Nov 02 '23 15:11 lobis

I think the short answer is: we should instead make sure that all backends support the full AbstractFileSystem interface, with upstream default implementations when appropriate. I would be surprised if cat_file doesn't work for ssh, it should.

martindurant avatar Nov 02 '23 15:11 martindurant

(yes, I realise this doesn't answer your question, and I don't really know how you would go about it; you can always check if Backend.mathod is AbstractFileSystem.method, but those base methods are supposed to work! Effecticely, AbstractFileSystem is an ABC, and all methods with NotImplemented should be overridden in all subclasses)

martindurant avatar Nov 02 '23 15:11 martindurant

I think the short answer is: we should instead make sure that all backends support the full AbstractFileSystem interface, with upstream default implementations when appropriate. I would be surprised if cat_file doesn't work for ssh, it should.

That is good to hear, I thought there was some limitation as to why it wasn't implemented. I don't think it's implemented though.

import fsspec

print(f"fsspec version: {fsspec.__version__}") # fsspec version: 2023.10.0

host = "some_host"

fs = fsspec.filesystem(protocol="ssh", host=host)

# This works
fs.ls("/tmp")

# This raises a NotImplementedError
fs.cat_file("tmp/file")

lobis avatar Nov 02 '23 15:11 lobis

I am confused, because AbstractFileSystem.cat_file does exist and has no NotImplemented in it. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L752

martindurant avatar Nov 02 '23 15:11 martindurant

I am confused, because AbstractFileSystem.cat_file does exist and has no NotImplemented in it. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L752

>>> fs.cat_file("tmp/file")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 432, in _cat_file
    raise NotImplementedError
NotImplementedError

The not implemented error comes from here: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L431-L432

lobis avatar Nov 02 '23 15:11 lobis

Now I'm more confused, because SFTPFileSystem is not async

martindurant avatar Nov 02 '23 16:11 martindurant

Now I'm more confused, because SFTPFileSystem is not async

Sounds like a bug then, can you confirm you can reproduce this in your end? Otherwise I think I know what may be going on: I installed sshfs (pip install sshfs) thinking it was the correct way to access ssh files (as s3fs is for s3).

  • https://pypi.org/project/sshfs/ (not sure where the source code is, it's not linked in the pypi page).

lobis avatar Nov 02 '23 16:11 lobis

Ah, sshfs is not SFTPFileSystem. Post an issue on that repo, they should implement async _cat_file() there.

martindurant avatar Nov 02 '23 16:11 martindurant