universal_pathlib icon indicating copy to clipboard operation
universal_pathlib copied to clipboard

Custom subclasses now require a `fsspec` filesystem

Open joouha opened this issue 2 years ago • 3 comments

Previously it used to be possible to create arbitrary UPath subclasses like this:

from upath import UPath
from upath.registry import register_implementation

class MyPath(UPath):
    def exists(self):
        return True

register_implementation("my", MyPath)

print(UPath("my://path").exists())

However, UPath now checks for and requires a corresponding fsspec filesystem in the UPath._parse_storage_options method, raising a ValueError if one is not found:

Traceback (most recent call last):
  File "/home/josiah/code/universal_pathlib/./test.py", line 12, in <module>
    print(UPath("my://path").exists())
          ^^^^^^^^^^^^^^^^^^
  File "/home/josiah/code/universal_pathlib/upath/core.py", line 203, in __init__
    self._storage_options = type(self)._parse_storage_options(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/josiah/code/universal_pathlib/upath/core.py", line 290, in _parse_storage_options
    fs_cls: type[AbstractFileSystem] = get_filesystem_class(protocol)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/josiah/code/universal_pathlib/venv/lib/python3.11/site-packages/fsspec/registry.py", line 233, in get_filesystem_class
    raise ValueError(f"Protocol not known: {protocol}")
ValueError: Protocol not known: my

Would you consider handling this case and returning an empty storage options dictionary if no fsspec filesystem is found?

...

    @classmethod
    def _parse_storage_options(
        cls, urlpath: str, protocol: str, storage_options: Mapping[str, Any]
    ) -> dict[str, Any]:
        """Parse storage_options from the urlpath"""
        try:
            fs_cls: type[AbstractFileSystem] = get_filesystem_class(protocol)
        except ValueError:
            return {}
        else:
            pth_storage_options = fs_cls._get_kwargs_from_urls(urlpath)
            return {**pth_storage_options, **storage_options}
...

Thanks, Josiah

joouha avatar Feb 19 '24 10:02 joouha

Yes, this is intentional, because we can't really predict what a filesystem class could extract from the uri as storage_options.

There are multiple ways to achieve what you want, including the one you suggest.

Could you describe your intention behind creating a UPath subclass for the "my" protocol, without defining a new AbstractFileSystem subclass for the "my" protocol?

ap-- avatar Feb 19 '24 20:02 ap--

The protocol I'm creating a new subclass for is actually untitled: (I was just using my:path as an example). For my use case, UntitledPath objects don't have to do anything other than not exist.

When communicating with a LSP server, you use the file URI to signal which file you are referring to. Typically you use file:// URIs for this, but for the case of new unsaved files, you use untitled: as the URI scheme. As part of implementing LSP server support in euporie, I added a new UntitledPath class to use with untitled files to facilitate this.

It doesn't really matter, since as you say above it's very easy to work around. I just wasn't sure if this was a deliberate change or not.

Feel free to close this

joouha avatar Feb 19 '24 22:02 joouha

Interesting! Thanks for sharing the use case.

cross-referencing: https://github.com/neovim/neovim/issues/21276 because there are quite a few useful links to more context.

So if the "untitled" protocol is supposed to be a placeholder for in-memory files not written to disk, wouldn't the "memory" filesystem be an ideal candidate for it?

from fsspec.registry import register_implementation as fsspec_register_implementation
from fsspec.implementations.memory import MemoryFileSystem


class UntitledFileSystem(MemoryFileSystem):
    store = {}
    pseudo_dirs = [""]
    protocol = "untitled"
    root_marker = "/"

    @classmethod
    def _strip_protocol(cls, path):    
        if path.startswith("untitled://"):
            path = path[len("untitled://") :]
        return super()._strip_protocol(path)

fsspec_register_implementation("untitled", UntitledFileSystem)


from upath import UPath
from upath.registry import register_implementation as upath_register_implementation
from upath.implementations.memory import MemoryPath


class UntitledPath(MemoryPath):
    pass


upath_register_implementation("untitled", UntitledPath)


u = UPath("untitled:///abc/file.txt")
print(repr(u))

ap-- avatar Feb 19 '24 22:02 ap--