universal_pathlib icon indicating copy to clipboard operation
universal_pathlib copied to clipboard

SSH / SFTP Implementations

Open juftin opened this issue 4 months ago • 3 comments

The following script works differently between 0.2.2 and 0.0.24 and I'm trying to figure the cleanest way to account for it:

import upath

ssh_path = upath.UPath("ssh://[email protected]:22/").resolve()

(yeah, I know SFTP/SSH isn't a current implementation - I'd be happy to help to implement it, it looks to work pretty seamlessly without any custom functions)

0.0.24

>>> ssh_path
UPath('ssh://[email protected]:22/)
>>> ssh_path.path
'/'
>>> str(ssh_path)
'ssh://[email protected]:22/'
>>> ssh_path.fs.ftp.normalize(ssh_path.path)
'/'
>>> upath.__version__
'0.0.24'

0.2.2

>>> ssh_path
UPath('ssh://.')
>>> ssh_path.path
'.'
>>> str(ssh_path)
'ssh://.'
>>> ssh_path.fs.ftp.normalize(ssh_path.path)
'/home/juftin'
>>> upath.__version__
'0.2.2'

The difference here is that 0.2.2 assumes you are starting from the user's home directory and every path is relative from there, this means paths like ssh://[email protected]:22/var/local are no longer working.

In my case the previous way the SFTP/SSH filesystem went to the root of the directory tree was more intuitive. I've been working on getting this to work for browsr / textual-universal-directorytree , the below code works for me:

class SFTPTextualPath(UPath):
    """
    SFTPTextualPath
    """

    @property
    def path(self) -> str:
        """
        Always return the path relative to the root
        """
        pth = super().path
        if pth.startswith("."):
            return f"/{pth[1:]}"
        elif pth.startswith("/"):
            return pth
        else:
            return "/" + pth

    def __str__(self) -> str:
        """
        Add the protocol prefix + extras to the string representation
        """
        string_representation = f"{self.protocol}://"
        if "username" in self.storage_options:
            string_representation += f"{self.storage_options['username']}@"
        string_representation += f"{self.storage_options['host']}"
        if "port" in self.storage_options:
            string_representation += f":{self.storage_options['port']}"
        string_representation += self.path
        return string_representation

juftin avatar Mar 14 '24 19:03 juftin

Hi @juftin,

thank you for reporting! This is a "bug" in the sense, that even though ssh filesystems are untested, it should be possible to avoid this for all untested filesystems where the netloc would be stripped or end up in the path anchor.

Correct behavior can be restored by setting a special internal setting in UPath's flavour implementation. Note that the following code fixes behavior for ssh:

import upath

# ------- v0.2.2 specific
# Please don't use these two lines in your code:

from upath._flavour import WrappedFileSystemFlavour
WrappedFileSystemFlavour.protocol_config["netloc_is_anchor"].add("ssh")

# There's going to be a correct fix implemented in `universal_pathlib`
# -------

ssh_path = upath.UPath("ssh://[email protected]:22/var/local")

print(f"{ssh_path.parts=}")  # ssh_path.parts=('/', 'var', 'local')
print(f"{ssh_path.root=}")   # ssh_path.root='/'

For a general fix, I need to check if it's possible to infer the netloc_is_anchor setting from the return value of a filesystem's _strip_protocol method, if provided with a generic uri with non-empty netloc.

But anyways, it would of course be great to add an SSHPath implementation and thoroughly test all path functionality.

Cheers, Andreas

ap-- avatar Mar 14 '24 20:03 ap--

Oh very neat. I saw the SFTPFileSystemFlavour - but wasn't sure what might be needed there. I'd be happy to get a SSHPath implementation going - that would need this upstream netloc_is_anchor inference feature first though, right?

juftin avatar Mar 14 '24 23:03 juftin

that would need this upstream netloc_is_anchor inference feature first though, right?

No, not really. You could add "ssh" and "sftp" here: https://github.com/fsspec/universal_pathlib/blob/380144c18f291f0f0a15fe8a02bc265233dd594b/upath/_flavour.py#L106-L118

and make a SFTPPath / SSHPath implementation like the others. Most of the work will be to make a test fixture that sets up a local ssh server to test against. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/tests/test_sftp.py this should be a good source for how to set this up.

ap-- avatar Mar 15 '24 16:03 ap--