pyfilesystem2
pyfilesystem2 copied to clipboard
feat: raise an error if no FTP parsing matches
See https://github.com/PyFilesystem/pyfilesystem2/issues/506#issuecomment-990905758
So regarding this, here is how I think we could proceed, of course i can take input from the @PyFilesystem/maintainers :
- Add a new exception
UnsupportedFTPFormattofs.errorsinheritingValueError, which would be thrown byfs._ftp_parse.parse_linewhen no suitable decoder are found (instead of returningNone). This exception would mark that the FTP return a listing in a format that was not recognized - Update
ftpfs.ftp_errorsto take an opname in addition to the path, likeconvert_os_errorsdoes, and raise anerrors.UnsupportedOperationwrapping an eventualUnsupportedFTPFormatthat was raised in the internal parsing code.
The end-user would receive something like:
Traceback (most recent call last):
...
fs.errors.OperationFailed: operation failed, unsupported FTP listing format
and upon inspecting the object you could actually retrieve the line that was not recognized.
Is the second layer of catching -> re-raising necessary in this instance? Looking at the source it seems like convert_os_errors and ftp_errors have a clear purpose: catching arbitrary OS / FTP errors and converting them to well defined FS errors w/ a user friendly error message. Given that the UnsupportedFTPFormat we would be adding is something we are raising, can't we just add it to fs.errors and let it bubble up directly?
As a side note, is there any particular reason that fs.ftpfs.ftp_errors and fs.error_tools. convert_os_errors are part of the public API?
Thank for explaining 😄
Is the second layer of catching -> re-raising necessary in this instance?
I would think so, yes, because I'd want to keep a layer of separation between the two issues. The FTP parser fails to parse the line, and because of this, listdir fails, for instance. I understand why it may be cumbersome at first, but I think it wouldn't be very natural of fs._ftp_parse.parse_line to raise an OperationFailed error, because it doesn't have access to the location of the resource like ftp_errors does when it wraps eventual errors.
As a side note, is there any particular reason that fs.ftpfs.ftp_errors and fs.error_tools. convert_os_errors are part of the public API?
Really none, users shouldn't have to use them anywhere, but they may be useful for extension developers. They are not documented so I wouldn't really consider them part of the public API to be honest.
I think it wouldn't be very natural of
fs._ftp_parse.parse_lineto raise anOperationFailederror, because it doesn't have access to the location of the resource
That seems like a good enough reason to have that second layer. A user friendly error is important, so including information on the resource seems pretty essential.
is there any particular reason that fs.ftpfs.ftp_errors and fs.error_tools. convert_os_errors are part of the public API?
Really none, users shouldn't have to use them anywhere, but they may be useful for extension developers. They are not documented so I wouldn't really consider them part of the public API to be honest.
Might be good to chuck a _ prefix on them then to make it explicit that they're an implementation detail.
I started working on this, but I noticed there's a test that seems to require the behavior of skipping unparsable lines: https://github.com/PyFilesystem/pyfilesystem2/blob/a6ea045e766c76bae2e2fde19294c8275773ffde/tests/test_ftp_parse.py#L194
It seems like the idea is to skip unparsable lines but parse other lines. That would make sense if on some systems there can be some sort of header line or something. I don't know enough about FTP specs to know if that's allowed or not. A couple options I can think of:
- Only raise the error if no lines match. I don't like this much because this would not have helped in the original scenario that prompted this issue if there were 1 file in the directory w/ a line that doesn't match the regex (the whole idea behind this issue is to raise an error instead of returning 0 files in the directory).
- Only skip blank lines on Windows NT (that's what the test is running on). This would require knowing the platform ahead of time, which I don't think we do. Also I think someone with more knowledge of FTP protocols than me should weigh in on this if that's the route taken.
- Figure out what
"unparsable line"is standing in for in the first place and write a regex + parser that matches exactly that (decode_windows_nt_headeror something) that returnsNoneand thus we skip the line because we know this is a line we should skip. This sounds like the best solution, but I am definitely not qualified to evaluate if such a thing is possible.
Hmm, that's not very satisfying indeed. This particular unit test was contributed by @sspross in #227 , so maybe he has an idea of what the "unparsable line" stands for here? There is a chance it may be an FTP banner or prolog, but I don't known if that's the case, and Windows NT FTP servers are clearly legacy so I cannot easily start one to check.
The other option on the safe side would be not to raise the error in v2.4.15, but actually throw a DeprecationWarning that says something like: Unrecognized line in FTP listing, this will be an error in PyFilesystem2 v2.5 so that we can break on the next minor release.
Let's give Silvian a couple days to reply and go from there 😃