filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

LocalFileSystem inconsistent with Windows paths

Open kasuteru opened this issue 3 years ago • 10 comments

from fsspec.implementations.local import LocalFileSystem
fs = LocalFileSystem()
fs.ls("C:/")

This lists the content of os.getcwd() instead of C:/. The same for similar commands like

fs.ls("file://C:/")

pretty sure this is a bug?

kasuteru avatar Apr 19 '22 07:04 kasuteru

I suppose the Windows path is actually r"C:\" ?

martindurant avatar Apr 19 '22 13:04 martindurant

No, that doesn't work either. Your version raises an error because even in raw paths, the backslash cannot be at the end. And fs.ls("C:\\") still outputs the contents of cwd instead of C:.

kasuteru avatar Apr 21 '22 08:04 kasuteru

Any update on this?

kasuteru avatar Aug 05 '22 19:08 kasuteru

Any windows path expert care to have a look?

martindurant avatar Aug 05 '22 19:08 martindurant

@kasuteru Seems like you are pretty deep into it already. Maybe you could try debugging it further? That would be the fastest way to resolve this.

efiop avatar Aug 05 '22 19:08 efiop

Not an expert, but I traced it, and the problem seems to be the following:

1.) os.listdir("C:") actually returns the contents of os.getcwd(). Not a problem in itself, but: 2.) This line is causing the problem:

https://github.com/fsspec/filesystem_spec/blob/5d6c87e562396f352645df7cd555a2b4b6eec84a/fsspec/implementations/local.py#L196

It strips away the "/", meaning what ends up getting passed to os.listdir() is "C:" (see 1.)

kasuteru avatar Aug 05 '22 19:08 kasuteru

If I implemented this from scratch, I would probably go with pathlib.Path instead of strings, since there a lot of special cases regarding file systems and strings (which pathlib handles implicitely by using WindowsPath and UnixPath). Since the code already exists, I am trying to think of a way to handle this edge case without breaking all the other functionality...

kasuteru avatar Aug 05 '22 19:08 kasuteru

pathlib is oppressively slow compared to strings. We endeavour to convert all paths to posix at the earliest opportunity, so we only have to deal with that from then on.

martindurant avatar Aug 05 '22 20:08 martindurant

It looks like all occurences of rstrip("/") in https://github.com/fsspec/filesystem_spec/blob/5d6c87e562396f352645df7cd555a2b4b6eec84a/fsspec/implementations/local.py are unnecessary except the one in _strip_protocol? Since the very last command of _strip_protocol is already rstrip, and rstrip is only ever called right after _strip_protocol?

kasuteru avatar Aug 05 '22 20:08 kasuteru

Investigating further: Since this is windows-specific behavior, the fix should probably go into _strip_protocol instead of appending rstrip afterwards. However, this might have unintended consequences, since some other functions test for the existence of /.

Basically, this is an edge case that pathlib implements, and fsspec somehow would have to mirror. In the screenshot below, note that pathlib never appends a trailing / to the path, except when it is a pure disk drive path:

image

kasuteru avatar Aug 05 '22 20:08 kasuteru