pyfilesystem2 icon indicating copy to clipboard operation
pyfilesystem2 copied to clipboard

feat: raise an error if no FTP parsing matches

Open adriangb opened this issue 3 years ago • 8 comments

See https://github.com/PyFilesystem/pyfilesystem2/issues/506#issuecomment-990905758

adriangb avatar Dec 10 '21 14:12 adriangb

So regarding this, here is how I think we could proceed, of course i can take input from the @PyFilesystem/maintainers :

  • Add a new exception UnsupportedFTPFormat to fs.errors inheriting ValueError, which would be thrown by fs._ftp_parse.parse_line when no suitable decoder are found (instead of returning None). This exception would mark that the FTP return a listing in a format that was not recognized
  • Update ftpfs.ftp_errors to take an opname in addition to the path, like convert_os_errors does, and raise an errors.UnsupportedOperation wrapping an eventual UnsupportedFTPFormat that was raised in the internal parsing code.

The end-user would receive something like:

Traceback (most recent call last):
  ...
fs.errors.OperationFailed: operation failed, unsupported FTP listing format

and upon inspecting the object you could actually retrieve the line that was not recognized.

althonos avatar Dec 13 '21 15:12 althonos

Is the second layer of catching -> re-raising necessary in this instance? Looking at the source it seems like convert_os_errors and ftp_errors have a clear purpose: catching arbitrary OS / FTP errors and converting them to well defined FS errors w/ a user friendly error message. Given that the UnsupportedFTPFormat we would be adding is something we are raising, can't we just add it to fs.errors and let it bubble up directly?

As a side note, is there any particular reason that fs.ftpfs.ftp_errors and fs.error_tools. convert_os_errors are part of the public API?

Thank for explaining 😄

adriangb avatar Dec 13 '21 15:12 adriangb

Is the second layer of catching -> re-raising necessary in this instance?

I would think so, yes, because I'd want to keep a layer of separation between the two issues. The FTP parser fails to parse the line, and because of this, listdir fails, for instance. I understand why it may be cumbersome at first, but I think it wouldn't be very natural of fs._ftp_parse.parse_line to raise an OperationFailed error, because it doesn't have access to the location of the resource like ftp_errors does when it wraps eventual errors.

As a side note, is there any particular reason that fs.ftpfs.ftp_errors and fs.error_tools. convert_os_errors are part of the public API?

Really none, users shouldn't have to use them anywhere, but they may be useful for extension developers. They are not documented so I wouldn't really consider them part of the public API to be honest.

althonos avatar Dec 13 '21 17:12 althonos

I think it wouldn't be very natural of fs._ftp_parse.parse_line to raise an OperationFailed error, because it doesn't have access to the location of the resource

That seems like a good enough reason to have that second layer. A user friendly error is important, so including information on the resource seems pretty essential.

is there any particular reason that fs.ftpfs.ftp_errors and fs.error_tools. convert_os_errors are part of the public API?

Really none, users shouldn't have to use them anywhere, but they may be useful for extension developers. They are not documented so I wouldn't really consider them part of the public API to be honest.

Might be good to chuck a _ prefix on them then to make it explicit that they're an implementation detail.

adriangb avatar Dec 13 '21 17:12 adriangb

I started working on this, but I noticed there's a test that seems to require the behavior of skipping unparsable lines: https://github.com/PyFilesystem/pyfilesystem2/blob/a6ea045e766c76bae2e2fde19294c8275773ffde/tests/test_ftp_parse.py#L194

It seems like the idea is to skip unparsable lines but parse other lines. That would make sense if on some systems there can be some sort of header line or something. I don't know enough about FTP specs to know if that's allowed or not. A couple options I can think of:

  1. Only raise the error if no lines match. I don't like this much because this would not have helped in the original scenario that prompted this issue if there were 1 file in the directory w/ a line that doesn't match the regex (the whole idea behind this issue is to raise an error instead of returning 0 files in the directory).
  2. Only skip blank lines on Windows NT (that's what the test is running on). This would require knowing the platform ahead of time, which I don't think we do. Also I think someone with more knowledge of FTP protocols than me should weigh in on this if that's the route taken.
  3. Figure out what "unparsable line" is standing in for in the first place and write a regex + parser that matches exactly that (decode_windows_nt_header or something) that returns None and thus we skip the line because we know this is a line we should skip. This sounds like the best solution, but I am definitely not qualified to evaluate if such a thing is possible.

adriangb avatar Dec 14 '21 16:12 adriangb

Hmm, that's not very satisfying indeed. This particular unit test was contributed by @sspross in #227 , so maybe he has an idea of what the "unparsable line" stands for here? There is a chance it may be an FTP banner or prolog, but I don't known if that's the case, and Windows NT FTP servers are clearly legacy so I cannot easily start one to check.

althonos avatar Dec 17 '21 11:12 althonos

The other option on the safe side would be not to raise the error in v2.4.15, but actually throw a DeprecationWarning that says something like: Unrecognized line in FTP listing, this will be an error in PyFilesystem2 v2.5 so that we can break on the next minor release.

althonos avatar Dec 17 '21 11:12 althonos

Let's give Silvian a couple days to reply and go from there 😃

adriangb avatar Dec 17 '21 13:12 adriangb