universal_pathlib
universal_pathlib copied to clipboard
SSH / SFTP Implementations
The following script works differently between 0.2.2
and 0.0.24
and I'm trying to figure the cleanest way to account for it:
import upath
ssh_path = upath.UPath("ssh://[email protected]:22/").resolve()
(yeah, I know SFTP/SSH isn't a current implementation - I'd be happy to help to implement it, it looks to work pretty seamlessly without any custom functions)
0.0.24
>>> ssh_path
UPath('ssh://[email protected]:22/)
>>> ssh_path.path
'/'
>>> str(ssh_path)
'ssh://[email protected]:22/'
>>> ssh_path.fs.ftp.normalize(ssh_path.path)
'/'
>>> upath.__version__
'0.0.24'
0.2.2
>>> ssh_path
UPath('ssh://.')
>>> ssh_path.path
'.'
>>> str(ssh_path)
'ssh://.'
>>> ssh_path.fs.ftp.normalize(ssh_path.path)
'/home/juftin'
>>> upath.__version__
'0.2.2'
The difference here is that 0.2.2
assumes you are starting from the user's home directory and every path is relative from there, this means paths like ssh://[email protected]:22/var/local
are no longer working.
In my case the previous way the SFTP/SSH filesystem went to the root of the directory tree was more intuitive. I've been working on getting this to work for browsr / textual-universal-directorytree , the below code works for me:
class SFTPTextualPath(UPath):
"""
SFTPTextualPath
"""
@property
def path(self) -> str:
"""
Always return the path relative to the root
"""
pth = super().path
if pth.startswith("."):
return f"/{pth[1:]}"
elif pth.startswith("/"):
return pth
else:
return "/" + pth
def __str__(self) -> str:
"""
Add the protocol prefix + extras to the string representation
"""
string_representation = f"{self.protocol}://"
if "username" in self.storage_options:
string_representation += f"{self.storage_options['username']}@"
string_representation += f"{self.storage_options['host']}"
if "port" in self.storage_options:
string_representation += f":{self.storage_options['port']}"
string_representation += self.path
return string_representation
Hi @juftin,
thank you for reporting! This is a "bug" in the sense, that even though ssh filesystems are untested, it should be possible to avoid this for all untested filesystems where the netloc would be stripped or end up in the path anchor.
Correct behavior can be restored by setting a special internal setting in UPath's flavour implementation. Note that the following code fixes behavior for ssh:
import upath
# ------- v0.2.2 specific
# Please don't use these two lines in your code:
from upath._flavour import WrappedFileSystemFlavour
WrappedFileSystemFlavour.protocol_config["netloc_is_anchor"].add("ssh")
# There's going to be a correct fix implemented in `universal_pathlib`
# -------
ssh_path = upath.UPath("ssh://[email protected]:22/var/local")
print(f"{ssh_path.parts=}") # ssh_path.parts=('/', 'var', 'local')
print(f"{ssh_path.root=}") # ssh_path.root='/'
For a general fix, I need to check if it's possible to infer the netloc_is_anchor
setting from the return value of a filesystem's _strip_protocol
method, if provided with a generic uri with non-empty netloc.
But anyways, it would of course be great to add an SSHPath implementation and thoroughly test all path functionality.
Cheers, Andreas
Oh very neat. I saw the SFTPFileSystemFlavour - but wasn't sure what might be needed there. I'd be happy to get a SSHPath implementation going - that would need this upstream netloc_is_anchor
inference feature first though, right?
that would need this upstream netloc_is_anchor inference feature first though, right?
No, not really. You could add "ssh"
and "sftp"
here:
https://github.com/fsspec/universal_pathlib/blob/380144c18f291f0f0a15fe8a02bc265233dd594b/upath/_flavour.py#L106-L118
and make a SFTPPath / SSHPath implementation like the others. Most of the work will be to make a test fixture that sets up a local ssh server to test against. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/tests/test_sftp.py this should be a good source for how to set this up.