filesystem_spec
filesystem_spec copied to clipboard
Strip protocol from paths in fsspec.generic.GenericFileSystem._copy
Seems to fix a problem I was having.
Can't do it upstream in the rsync
method because at least one file needs the protocol to resolve the proper filesystem.
This is probably reasonable (even given a fix in sshfs), but it would be nice if we could engineer a case that failed before this fix and now passes, as a test against regression.
You are referencing this fix in sshfs for the _info
/isdir
stuff, right? Yeah, that fix got around one problem I was having with GenericFileSystem when it checks for a dir.
This fix is to get around another problem i was having after I resolved the sshfs one above.
And yeah, a breaking/fixing test case is a grand idea. I will work on that. I imagine I can use a mockssh server for this, perhaps? hmmm.
fs = fsspec.url_to_fs("sftp://user@host:1234")[0]
fsspec.generic.rsync(
"sftp:///stuff",
"gs://dbucket/dir1/dir2",
inst_kwargs={"default_method": "current"},
)
is what i was using locally to test this... i wonder if the existing test_rsync
isn't catching this here because it doesn't use the default_method=current
? the other tests, however, do but not rsync.
I imagine I can use a mockssh server for this, perhaps?
memory, local and ftp FSs are the typical ones used in tests (in that order). You could make a mock, but if any of those show the problem, it would be easier.
hmmm, seems like any filesystem that doesn't strip in their _info
or info
calls has a similar problem with GenericFileSystem
.
I decided to try this with the built in implementation of sftp instead of sshfs
and sure enough, same issue with isdir
failing and the same fix.
also was able to reproduce this in the test by commenting out the strip in the memory impl here.
with that line commented out the test fails in the same way as sshfs
and the sftp
impl.
i don't think the fix can be done in AbstractFileSystem
or AsyncAbstractFileSystem
so every impl needs to make sure they strip the protocol in their info
and _info
methods?
Yes I think all backends should be able to cope with paths with or without protocol specifiers.
bleh, this is becoming quite tricky to test. not certain how to do it.
would it be better to move this out of fsspec
and just add another fix to sshfs
? in sshfs
i think this will have to be done in almost all locations that use a path
.
If you can state the problem as "sshfs should always use _strip_protocol, but doesn't", then yes.
The logic in Generic changed recently, not to update the "name" field of the dircache of its target filesystems (part of https://github.com/fsspec/filesystem_spec/pull/1633 ). That might be enough to fix the issue here.