filesystem_spec
filesystem_spec copied to clipboard
Ability to check if a filesystem implements an interface
Apologies if this is currently possible, I couldn't find a clean way to achieve this.
I think it would be useful to have a way to check if a given filesystem implements a given interface (e.g. is cat_file
implemented for the ssh
protocol?).
Currently, when calling an interface that is not implemented, a NotImplementedError
will be thrown since this method is defined in the base class. Checking if a method with the requested name is present in the class will always return True
since it's defined in the base class (but not implemented). I guess there are some workarounds such as using the inspect
module but
My use case is a library that uses fsspec for file access and relies on specific interfaces. If the interface is not present for the filesystem it would be nice to throw an exception pointing the user to the specific fsspec
protocol and method that needs to be implemented (so they can open an issue to request it, if it makes sense).
This can currently be achieved wrapping the code in a try/except block and checking for NotImplementedError
but personally I don't find this solution very clean. In order to avoid wrapping the code one could also add this try/except block on initialization to check if the method is implemented (by calling it with trivial arguments) but this may introduce some delay in the case the method is implemented, which would be most cases so I also don't like this.
There already exists a nice way to check if a protocol is available in fsspec using the registry, I guess this could be an extension to this feature, where the individual filesystems register what interfaces they support.
I think the short answer is: we should instead make sure that all backends support the full AbstractFileSystem interface, with upstream default implementations when appropriate. I would be surprised if cat_file doesn't work for ssh, it should.
(yes, I realise this doesn't answer your question, and I don't really know how you would go about it; you can always check if Backend.mathod is AbstractFileSystem.method
, but those base methods are supposed to work! Effecticely, AbstractFileSystem is an ABC, and all methods with NotImplemented should be overridden in all subclasses)
I think the short answer is: we should instead make sure that all backends support the full AbstractFileSystem interface, with upstream default implementations when appropriate. I would be surprised if cat_file doesn't work for ssh, it should.
That is good to hear, I thought there was some limitation as to why it wasn't implemented. I don't think it's implemented though.
import fsspec
print(f"fsspec version: {fsspec.__version__}") # fsspec version: 2023.10.0
host = "some_host"
fs = fsspec.filesystem(protocol="ssh", host=host)
# This works
fs.ls("/tmp")
# This raises a NotImplementedError
fs.cat_file("tmp/file")
I am confused, because AbstractFileSystem.cat_file does exist and has no NotImplemented in it. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L752
I am confused, because AbstractFileSystem.cat_file does exist and has no NotImplemented in it. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L752
>>> fs.cat_file("tmp/file")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/fsspec/asyn.py", line 432, in _cat_file
raise NotImplementedError
NotImplementedError
The not implemented error comes from here: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L431-L432
Now I'm more confused, because SFTPFileSystem is not async
Now I'm more confused, because SFTPFileSystem is not async
Sounds like a bug then, can you confirm you can reproduce this in your end? Otherwise I think I know what may be going on: I installed sshfs
(pip install sshfs
) thinking it was the correct way to access ssh files (as s3fs
is for s3
).
- https://pypi.org/project/sshfs/ (not sure where the source code is, it's not linked in the pypi page).
Ah, sshfs
is not SFTPFileSystem. Post an issue on that repo, they should implement async _cat_file()
there.