filesystem_spec
filesystem_spec copied to clipboard
FileNotFoundError is raised when failing to access a http url
When failing to access a http url (e.g., a network error), FileNotFoundError
will be raised here: https://github.com/fsspec/filesystem_spec/blob/bdd752544ac4629deb4565103e4e42855af277a8/fsspec/implementations/http.py#L393-L397
Should this be considered a bug since FileNotFoundError
is usually be used to check whether hitting a 404 error?
When working with a local file system, you would expect to get a range of errors when operating on files, derived from IOError/OSError, like IsADirectoryError. FileNotFoundError is, though, by far the most common.
For the case of HTTP, there are a massive number of things that can go wrong, some of them are exceptions whereas some are successful calls that come back with an error code and message. In packages like s3fs and gcsfs (well-defined APIs), we go to the effort to translate the various cases into specific exception types. We could, in theory, do the same here - but the HTTP space is much broader and poorly defined from server to server, and I don't imagine we could do this rigorously.
Since the exception is raised from ...
, any catching code can inspect the original cause and report/behave appropriately.
Sorry. I was not clear. I didn't ask to translate the various cases into specific exception types. I'm wondering if we can avoid wrapping the original cause with FileNotFoundError
, such as just raising the original cause directly. FileNotFoundError has a special semantics and it's weird to use it to wrap arbitrary exceptions.
Raised when a file or directory is requested but doesn’t exist. Corresponds to errno ENOENT.
The thing is, in fsspec we are very much viewing the remote storage as if it were a file system, so I would say that FileNotFound is exactly the right exception. Users of fsspec need to be able to perform file-like operations on the various backends and expect to get file-related exceptions when these fail, not exceptions specific to the backend in question.
How about IOError which is a more general error?
Why would that fix your previous wish to not wrap the original exception, though? I don't think we can do both. I do think it's reasonable to make an attempt to understand the nature of the exception and raise accordingly (404->FileNotFound, 401/403->PermissionError), but there are many possibilities. An IOError should really have an error code to be useful.
Why would that fix your previous wish to not wrap the original exception, though?
We hit an issue that FileNotFoundError caused by a connection issue
was interpreted as the file doesn't exist
and went to an undesired code path when it should not. The issue I wish to fix to not raise FileNotFoundError
as it has a special semantics. IOError
is a more general error and people usually will fail fast rather than moving forward.
Might I suggest that the calling code should except as specific as possible for the code being called? Otherwise, we should indeed make a full exception translator, rather than falling back to a generic exception, I think.
Might I suggest that the calling code should except as specific as possible for the code being called?
Currently it's out of our control. It's happening in pyarrow and the error is dropped: https://github.com/apache/arrow/blob/e60e6b00ee76da275518ab231f1ea40dc7c7d1aa/python/pyarrow/fs.py#L309-L310
I feel there could be more libraries having the similar code. So it's better to fix it in fsspec.
cc @isidentical and anyone else
Perhaps we should standardise our errors in general and make a version of gcsfs.retry.validate_response
or s3fs.errors.Translate_boto_error
in this repo, that the others can make use of. Note that the former raises a RuntimeError if no more specific exception is matched, and the latter an IOError.
(excerpt from s3fs)
{...
"301": functools.partial(IOError, EREMCHG), # PermanentRedirect
"307": functools.partial(IOError, EREMCHG), # Redirect
"400": functools.partial(IOError, errno.EINVAL),
"403": PermissionError,
"404": FileNotFoundError,
"405": functools.partial(IOError, errno.EPERM),
"409": functools.partial(IOError, errno.EBUSY),
"412": functools.partial(IOError, errno.EINVAL), # PreconditionFailed
"416": functools.partial(IOError, errno.EINVAL), # InvalidRange
"500": functools.partial(IOError, EREMOTEIO), # InternalError
"501": functools.partial(IOError, errno.ENOSYS), # NotImplemented
"503": functools.partial(IOError, errno.EBUSY), # SlowDown
}