pyfilesystem2
pyfilesystem2 copied to clipboard
Unable to handle fstat exceptions when using fs.walk.info
I'm using the following code to walk a filesystem:
for (path, info) in remote_fs.walk.info(namespaces=['details']):
It contains a broken symbolic link, and thus fstat()
fails, and interrupts the walk process with the following exception:
Traceback (most recent call last):
File "/home/jjmontes/git/sitetool/env/lib/python3.6/site-packages/fs-2.4.5-py3.6.egg/fs/osfs.py", line 492, in _scandir
stat_result = dir_entry.stat()
FileNotFoundError: [Errno 2] No such file or directory: b'/tmp/cubesviewer/as'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jjmontes/git/sitetool/env/bin/sit", line 11, in <module>
load_entry_point('sittool', 'console_scripts', 'sit')()
File "/home/jjmontes/git/sitetool/sitetool/core/bootstrap.py", line 166, in main
bootstrap.main() # sys.argv[1:]
File "/home/jjmontes/git/sitetool/sitetool/core/bootstrap.py", line 152, in main
st.command.run()
File "/home/jjmontes/git/sitetool/sitetool/sites.py", line 229, in run
size_data = self.ctx.get('sites').calculate_site_size(site)
File "/home/jjmontes/git/sitetool/sitetool/sites.py", line 154, in calculate_site_size
files = site.comp('files').file_list('')
File "/home/jjmontes/git/sitetool/sitetool/files/fs.py", line 48, in file_list
for (path, info) in remote_fs.walk.info(namespaces=['details']):
File "/home/jjmontes/git/sitetool/env/lib/python3.6/site-packages/fs-2.4.5-py3.6.egg/fs/walk.py", line 405, in info
for _path, info in _walk:
File "/home/jjmontes/git/sitetool/env/lib/python3.6/site-packages/fs-2.4.5-py3.6.egg/fs/walk.py", line 433, in _walk_breadth
for info in _scan(fs, dir_path, namespaces=namespaces):
File "/home/jjmontes/git/sitetool/env/lib/python3.6/site-packages/fs-2.4.5-py3.6.egg/fs/walk.py", line 298, in _scan
six.reraise(type(error), error)
File "/home/jjmontes/git/sitetool/env/lib/python3.6/site-packages/six-1.12.0-py3.6.egg/six.py", line 692, in reraise
raise value.with_traceback(tb)
fs.errors.ResourceNotFound: resource '/' not found
Python os.walk
provides an onerror
attribute to handle this situation. Is there as similar option for Filesystem? I'm facing a case where I want to report these errors but otherwise continue the operation.
I can workaround this by removing the 'details' namespace and calling remote_fs.getinfo()
myself but I fear that I might be incurring in a performance penalty if doing so.
There is an ignore_errors
flag which can be set to False
to ignore errors. And a on_error
callback which can be used to filter errors. See Walker referencefor details.
Does that handle your broken link?
Thank you for pointing me to the documentation, I had vastly overlooked kwargs there.
However, now that I'm handling the error with on_errors
, my on_errors handler receives an exception with the following message and path (corresponding to the stack trace above):
fs.errors.ResourceNotFound: resource '/' not found
However, the root cause of this failure is the previously mentioned path:
FileNotFoundError: [Errno 2] No such file or directory: b'/tmp/cubesviewer/as'
As a result, seems to me that we cannot list which paths are really causing the problem. Do you think this is expected behaviour or am I doing something wrong?
Also, even when I'm returning True to ignore errors, file listing stops at that point and it should continue.
(Note that this only happens if using the details
namespace).
ResourceNotFound
has an exc
attribute which contains the wrapped exception, FileNotFoundError
in your case. Furthermore, you could use remote_fs.getsyspath(err.path)
to get the real path causing an issue !
Since the provided path is '/', what I get from remote_fs.getsyspath
is the real root path of the filesystem. What I need is the relative path within the filesystem to the file that cannot be listed, and I find it available only in the wrapped exception (err.exc.filename
). But this relies on the exception being a ResourceNotFound
, and also on the wrapped exception being a FileNotFound
(which will possibly vary if the backend was different). And anyway this err.exc.filename
is an absolute path to the backend filesystem, and I need to show and compare relative paths (as the backend could be any). So I can't still get the relative path (eg. 'brokenlink'
instead of '/tmp/filesystem-root/brokenlink'
).
I was expecting that the 'path' argument of the on_error
handler would receive the filesystem-relative path to the file that caused the listing error (this way abstracting the resource location from the actual filesystem backend). But I'm receiving /
which I struggle to make sense of. Otherwise, what does path
mean? (sorry if I am misunderstanding something here).
But yet, note that the bigger issue here is that file listing stops at that point, while I need to ignore errors and continue walking!
The on_error callback should return True
if you want to continue scanning, or False
if you want to re-raise the exception (the docs are a little lacking in this area).
The path you get in that error is the directory that it was unable to list. If that path is /
it essentially means the the root of wherever the OSFS was constructed with.
I think what is happening is that the walker is treating a broken link within a directory the same as if the directory itself was broken. The 'fix' is probably to catch stat
related errors in scandir and simply not add the namespace it it couldn't be read.
I've tagged this as a bug. In the meantime, your workaround of removing the namespace and calling getinfo
is a reasonable one.
I'm returning True but listing stops. It also stops if I use ignore_errors
. This sounds related to your diagnostic of the root cause (treating symbolic links as if the entire parent directory could not be accessed). By the way also happens with 'tar' backends, in case that helps.
Side question: does calling 'stat' individually have any performance penalty? I'd expect that most filesystems could optimize this is properly using walk
.
Thanks for your support.
Regarding:
The 'fix' is probably to catch stat related errors in scandir and simply not add the namespace it it couldn't be read.
I think you meant not add the file/dir if it couldn't be read?
Makes sense to me. Still it should be reported as an error, providing the exact path that has failed, as client code needs to know something (and what) is missing from the result :-).
I think its stopping because it sees the root as broken, and there isn't anywhere to go after that.
Calling getinfo will have a small performance impact. Probably negligible if you are reading any data as well as listing directories
I think you meant not add the file/dir if it couldn't be read?
Actually, no. What I was thinking was to leave the info in there (with basic namespace), but no 'details' namespace. That would still allow you to see something was there, but that the details couldn't be read.
Calling getinfo will have a small performance impact. Probably negligible if you are reading any data as well as listing directories
I'm just listing directories. I think I'll try to avoid the workaround and wait for a solution upstream, since I can simply avoid broken symbolic links for now (development stage).
The solution you propose introduces a burden on the client code: it will have to check for the availability of the details namespace, since now some attributes may be missing (this may introduce some backwards compatibility issues, although I doubt it since users would already be seeing partial results). I do not object to this solution as long as client code is able to check that the file is a symbolic link and retrieve the target path it points to, or more in general, that the file caused a walk/getinfo error.
Thinking about it, I'd favour the on_error/ignore_errors approach. After all, a getinfo exception is still an exception in other parts of PyFilesystem API. The real problem is that the entire directory is failing, but that could be fixed and the error could still be reported by on_error/ignore_errors.
Otherwise the operation should fail entirely raising the exception, instead of providing partial results if the details
namespace could not be filled.
Just my two cents (and noting I'm very new to PyFilesystem, and both solutions solve my case, and I find pros and cons in both approaches).