filesystem_spec
filesystem_spec copied to clipboard
LocalFileSystem inconsistent with Windows paths
from fsspec.implementations.local import LocalFileSystem
fs = LocalFileSystem()
fs.ls("C:/")
This lists the content of os.getcwd() instead of C:/. The same for similar commands like
fs.ls("file://C:/")
pretty sure this is a bug?
I suppose the Windows path is actually r"C:\" ?
No, that doesn't work either. Your version raises an error because even in raw paths, the backslash cannot be at the end. And fs.ls("C:\\") still outputs the contents of cwd instead of C:.
Any update on this?
Any windows path expert care to have a look?
@kasuteru Seems like you are pretty deep into it already. Maybe you could try debugging it further? That would be the fastest way to resolve this.
Not an expert, but I traced it, and the problem seems to be the following:
1.) os.listdir("C:") actually returns the contents of os.getcwd(). Not a problem in itself, but: 2.) This line is causing the problem:
https://github.com/fsspec/filesystem_spec/blob/5d6c87e562396f352645df7cd555a2b4b6eec84a/fsspec/implementations/local.py#L196
It strips away the "/", meaning what ends up getting passed to os.listdir() is "C:" (see 1.)
If I implemented this from scratch, I would probably go with pathlib.Path instead of strings, since there a lot of special cases regarding file systems and strings (which pathlib handles implicitely by using WindowsPath and UnixPath). Since the code already exists, I am trying to think of a way to handle this edge case without breaking all the other functionality...
pathlib is oppressively slow compared to strings. We endeavour to convert all paths to posix at the earliest opportunity, so we only have to deal with that from then on.
It looks like all occurences of rstrip("/") in https://github.com/fsspec/filesystem_spec/blob/5d6c87e562396f352645df7cd555a2b4b6eec84a/fsspec/implementations/local.py are unnecessary except the one in _strip_protocol? Since the very last command of _strip_protocol is already rstrip, and rstrip is only ever called right after _strip_protocol?
Investigating further: Since this is windows-specific behavior, the fix should probably go into _strip_protocol instead of appending rstrip afterwards. However, this might have unintended consequences, since some other functions test for the existence of /.
Basically, this is an edge case that pathlib implements, and fsspec somehow would have to mirror. In the screenshot below, note that pathlib never appends a trailing / to the path, except when it is a pure disk drive path:
