filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

FileNotFoundError is raised when failing to access a http url

Open zsxwing opened this issue 3 years ago • 10 comments

When failing to access a http url (e.g., a network error), FileNotFoundError will be raised here: https://github.com/fsspec/filesystem_spec/blob/bdd752544ac4629deb4565103e4e42855af277a8/fsspec/implementations/http.py#L393-L397

Should this be considered a bug since FileNotFoundError is usually be used to check whether hitting a 404 error?

zsxwing avatar Dec 29 '21 08:12 zsxwing

When working with a local file system, you would expect to get a range of errors when operating on files, derived from IOError/OSError, like IsADirectoryError. FileNotFoundError is, though, by far the most common.

For the case of HTTP, there are a massive number of things that can go wrong, some of them are exceptions whereas some are successful calls that come back with an error code and message. In packages like s3fs and gcsfs (well-defined APIs), we go to the effort to translate the various cases into specific exception types. We could, in theory, do the same here - but the HTTP space is much broader and poorly defined from server to server, and I don't imagine we could do this rigorously.

Since the exception is raised from ..., any catching code can inspect the original cause and report/behave appropriately.

martindurant avatar Jan 04 '22 14:01 martindurant

Sorry. I was not clear. I didn't ask to translate the various cases into specific exception types. I'm wondering if we can avoid wrapping the original cause with FileNotFoundError, such as just raising the original cause directly. FileNotFoundError has a special semantics and it's weird to use it to wrap arbitrary exceptions.

Raised when a file or directory is requested but doesn’t exist. Corresponds to errno ENOENT.

zsxwing avatar Jan 04 '22 21:01 zsxwing

The thing is, in fsspec we are very much viewing the remote storage as if it were a file system, so I would say that FileNotFound is exactly the right exception. Users of fsspec need to be able to perform file-like operations on the various backends and expect to get file-related exceptions when these fail, not exceptions specific to the backend in question.

martindurant avatar Jan 04 '22 21:01 martindurant

How about IOError which is a more general error?

zsxwing avatar Jan 04 '22 21:01 zsxwing

Why would that fix your previous wish to not wrap the original exception, though? I don't think we can do both. I do think it's reasonable to make an attempt to understand the nature of the exception and raise accordingly (404->FileNotFound, 401/403->PermissionError), but there are many possibilities. An IOError should really have an error code to be useful.

martindurant avatar Jan 04 '22 21:01 martindurant

Why would that fix your previous wish to not wrap the original exception, though?

We hit an issue that FileNotFoundError caused by a connection issue was interpreted as the file doesn't exist and went to an undesired code path when it should not. The issue I wish to fix to not raise FileNotFoundError as it has a special semantics. IOError is a more general error and people usually will fail fast rather than moving forward.

zsxwing avatar Jan 04 '22 22:01 zsxwing

Might I suggest that the calling code should except as specific as possible for the code being called? Otherwise, we should indeed make a full exception translator, rather than falling back to a generic exception, I think.

martindurant avatar Jan 05 '22 22:01 martindurant

Might I suggest that the calling code should except as specific as possible for the code being called?

Currently it's out of our control. It's happening in pyarrow and the error is dropped: https://github.com/apache/arrow/blob/e60e6b00ee76da275518ab231f1ea40dc7c7d1aa/python/pyarrow/fs.py#L309-L310

I feel there could be more libraries having the similar code. So it's better to fix it in fsspec.

zsxwing avatar Jan 10 '22 05:01 zsxwing

cc @isidentical and anyone else

Perhaps we should standardise our errors in general and make a version of gcsfs.retry.validate_response or s3fs.errors.Translate_boto_error in this repo, that the others can make use of. Note that the former raises a RuntimeError if no more specific exception is matched, and the latter an IOError.

martindurant avatar Jan 10 '22 16:01 martindurant

(excerpt from s3fs)

{...
    "301": functools.partial(IOError, EREMCHG),  # PermanentRedirect
    "307": functools.partial(IOError, EREMCHG),  # Redirect
    "400": functools.partial(IOError, errno.EINVAL),
    "403": PermissionError,
    "404": FileNotFoundError,
    "405": functools.partial(IOError, errno.EPERM),
    "409": functools.partial(IOError, errno.EBUSY),
    "412": functools.partial(IOError, errno.EINVAL),  # PreconditionFailed
    "416": functools.partial(IOError, errno.EINVAL),  # InvalidRange
    "500": functools.partial(IOError, EREMOTEIO),  # InternalError
    "501": functools.partial(IOError, errno.ENOSYS),  # NotImplemented
    "503": functools.partial(IOError, errno.EBUSY),  # SlowDown
}

martindurant avatar Jan 10 '22 16:01 martindurant