filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Support a `get_filesystem` API?

Open ananthsub opened this issue 2 years ago • 3 comments

We often have this pattern:

def get_filesystem(path: str, **kwargs: Any) -> fsspec.AbstractFileSystem:
    """Returns the appropriate filesystem to use when handling the given path."""
    fs, _ = url_to_fs(path, **kwargs)
    return fs

This would be more convenient for callers than needing to handle the multiple outputs (filesystem, urlpath) returned by url_to_fs Would it be possible to offer this directly in the core fsspec APIs?

ananthsub avatar Oct 08 '22 00:10 ananthsub

I suppose we could? I must say, the current function isn't too hard to use!

martindurant avatar Oct 12 '22 13:10 martindurant

Possibly the scope of this could broaden to infer filesystems from the path if it's not fully-qualified? For example, if it's a pathlib.Path object or the protocol is unqualified, prepend file://, if it ends with .zip, use a ZipFileSystem, .tar($|\.) for TarFileSystem etc.? It wouldn't necessarily be perfect but would cover the vast majority of use cases, I think. Fsspec is great for being generic over different backends, but at present it requires the developer to know what the backend is to begin with.

clbarnes avatar Feb 15 '23 12:02 clbarnes

I cannot think of any cases beyond archival FSs with special path suffixes. I'm not opposed, but I don't see it catching many cases. Where there is no protocol at all, LocalFileSystem is already assumed.

at present it requires the developer to know what the backend is to begin with.

This is not the case with fsspec.open and GenericFileSystem. For the archival case, it is tricky: how do we tell the difference between someone who wants to access the contents as a filesystem versus someone who just wants to open the binary file and act on it as is (e.g., kerchunk would do this to extract offsets)?

martindurant avatar Feb 16 '23 14:02 martindurant