filesystem_spec
filesystem_spec copied to clipboard
FileNotFoundError raised if filename contains braces []
For example, uploading a file containing braces, leads to raising FileNotFoundError when expand_path calls glob
Steps to reproduce:
- Create a file with braces in the filename
- Make sure you can do
ls <thisFile>in your shell - Execute:
from fsspec.implementations.local import LocalFileSystem
fs = LocalFileSystem()
fs.expand_path("/home/user/path [with braces]")
I suppose that's getting caught up by the regex which is used for glob patterns. Does using escapes work?
fs.expand_path("/home/user/path \\[with braces\\]")
- that filename comes from some other call like
lsor something like that - the braces
[]are valid characters in path right? At leasttouch []works without escaping :) - however, the main question is why this
expand_pathis used for, say, uploading a single file.
why this expand_path is used for, say, uploading a single file
We do not know whether a given path resolves to one or more files without trying to expand. If you are certain you are working with one file, you can call the one-file variant directly and skip that step
fs.put_file(local, remote)
the braces [] are valid characters in path right?
I think this is backend-specific
Hm. If I want to recursively upload a directory with braces?
You will probably need to do some string processing yourself, sorry. Probably find() works for you, or maybe you need to escape the special characters, and then pass the generated list of filenames to put, and I would expect that to work. Else, you could duplicate the code of put to prevent using the expand code.
I'm not sure I can think of a way to cope with your situation for all cases. They very easiest thing you can do, is to rename the files to contain only regular characters!
I agree the limitation for file names is backend-specific, but this expansion behaviour is implemented in AbstractFileSystem. I would expect the .upload() will just, hm, try to upload (for instance - recursively), and should throw some exception if the remote backend FS is rejecting the call.
I would add that this path expansion in this call is absolutely unexpected behaviour, since the "ls"-like call from another library returns this unescaped and does file operations with unescaped filename.
(sorry) one more comment, the call fsspec.filesystem("file").ls("/tmp") returns the names unescaped as well:
['/tmp/[a].csv', '/tmp/[].csv']
It makes the API a bit inconsistent.
for f in fsspec.filesystem("file").ls("/tmp"):
print(f)
fsspec.filesystem("file").upload(f, f)
/tmp/[a].csv
Traceback (most recent call last):
File "<input>", line 3, in <module>
File "/home/REDACTED/.venv/lib/python3.8/site-packages/fsspec/spec.py", line 1223, in upload
return self.put(lpath, rpath, recursive=recursive, **kwargs)
File "/home/REDACTED/.venv/lib/python3.8/site-packages/fsspec/spec.py", line 836, in put
lpaths = fs.expand_path(lpath, recursive=recursive)
File "/home/REDACTED/.venv/lib/python3.8/site-packages/fsspec/spec.py", line 886, in expand_path
out = self.expand_path([path], recursive, maxdepth)
File "/home/REDACTED/.venv/lib/python3.8/site-packages/fsspec/spec.py", line 910, in expand_path
raise FileNotFoundError(path)
FileNotFoundError: ['/tmp/[a].csv']
The trouble is, I cannot think of a way to provide glob-like behaviour, in which "[", "]" are special, at the same time as passing those special characters through to the backend unaltered.
"ls"-like call from another library
bash expands to "[a].file" sometimes! I actually can't figure out bash's behaviour here, except that I suppose it deems it OK to test the existence of all files (we don't, typically).
I think your suggestion would amount interpreting the URL as a glob string, then interpreting it as a bare regex, and then adding into the list of files those files which exactly miss the original. This will still miss paths given as a mix of special characters and characters intended for expansion.
Note: python's own glob doesn't find such a file:
In [6]: "[a].file" in os.listdir('.')
Out[6]: True
In [7]: glob.glob("[a].file")
Out[7]: []
I think glob expects a pattern, while ls does not. And the result of globing is without escaping as well:
glob.glob("/tmp/*.csv")
['/tmp/[a].csv', '/tmp/[].csv']
But you are talking about put, which does accept glob patterns, so that you can do the equivalent of scp *.csv remote as fs.put("*.csv", remote) - which probably works for your bracket-containing files too.
In order to allow patterns in put, we need to do a glob. The only way to tell if a path should be passed to glob is to see if it has any special characters in: *, ?, [.
In short, I do not have a solution, sorry. If you can think of a way to satisfy all the needs, I'd be happy to see it.
What about just doing something like expand_path(..., glob=True) so the caller can pass glob=False into get/put/expand_path when they want the recursive dir expansion but are explicitly not using glob patterns?
@ianthomas23 - it sounds painful to add an expand=bool flag all over the place to cover special filenames. Do you think escaping the regex special characters would do the trick?
@ianthomas23 Do you think escaping the regex special characters would do the trick?
Yes I think that will work. I'll look into it.