filesystem_spec
filesystem_spec copied to clipboard
Support a `get_filesystem` API?
We often have this pattern:
def get_filesystem(path: str, **kwargs: Any) -> fsspec.AbstractFileSystem:
"""Returns the appropriate filesystem to use when handling the given path."""
fs, _ = url_to_fs(path, **kwargs)
return fs
This would be more convenient for callers than needing to handle the multiple outputs (filesystem, urlpath) returned by url_to_fs
Would it be possible to offer this directly in the core fsspec APIs?
I suppose we could? I must say, the current function isn't too hard to use!
Possibly the scope of this could broaden to infer filesystems from the path if it's not fully-qualified? For example, if it's a pathlib.Path
object or the protocol is unqualified, prepend file://
, if it ends with .zip
, use a ZipFileSystem
, .tar($|\.)
for TarFileSystem
etc.? It wouldn't necessarily be perfect but would cover the vast majority of use cases, I think. Fsspec is great for being generic over different backends, but at present it requires the developer to know what the backend is to begin with.
I cannot think of any cases beyond archival FSs with special path suffixes. I'm not opposed, but I don't see it catching many cases. Where there is no protocol at all, LocalFileSystem is already assumed.
at present it requires the developer to know what the backend is to begin with.
This is not the case with fsspec.open and GenericFileSystem. For the archival case, it is tricky: how do we tell the difference between someone who wants to access the contents as a filesystem versus someone who just wants to open the binary file and act on it as is (e.g., kerchunk would do this to extract offsets)?