sshfs
sshfs copied to clipboard
Changes to be consistent with GenericFileSystem
Hi All.
While looking at using the GenericFileSystem
and its rsync
function, I've noticed a few inconsistencies that sshfs
has with handling paths with the protocol(sftp://
).
One fix was done in #43 but there appear to be a few more. One proposed solution was to take this upstream to the implementation of GenericFileSystem
(see this discussion). But it is becoming apparent that this should be handled in the filesystem implementation. In this case, sshfs
.
I was hoping it would be as easy as adding self._strip_protocol(lpath)
to the references in _get_file and
_put_file` and related methods, but i think it's a bit more nuanced than a first glance.
I've written a test case or two to illustrate the problem:
def test_path_with_protocol(fs: SSHFileSystem, remote_dir):
# this would have failed before PR 43
assert fs.isdir("/.") == fs.isdir("sftp:///.")
# absolute path
assert fs._strip_protocol("sftp:///.") == fs._strip_protocol("/.")
# blah is detected as host and removed, even though a relative path may have been intended?
assert fs._strip_protocol("sftp://blah") == fs._strip_protocol("")
# another example of a potentially intended relative path but parsed as absolute
assert fs._strip_protocol("sftp:///.") == fs._strip_protocol("/.")
The largest problem is with relative paths being passed in. The current _strip_protocol
doesn't seem to be relative path friendly. I imagine this is mostly a non-issue for folks not using the GenericFileSystem
or anything else that passes in protocol-aware paths. But, according to other FS implementations and the recommendation in this PR to fsspec, filesystem implementations should be able to handle a path with or without a protocol.
Curios on everyone's thoughts on doing this and if there are ideas on the best way to implement.