For exists(), more precise error handling.
This PR improves the error handling logic for s3fs.S3FileSystem.exists() and fixes #750
First change is that there is no longer a generic except Exception block, meaning that e.g. ConnectionError now correctly raises an error instead of returning False when querying a bucket.
Second change is that a warning is now logged when attempting to query the existence of a bucket that the user does not have access to - this is important since the bucket might actually exist. This is in line with Amazon's proposed behavior, see here.
For more information on this, see related issue #750.
@martindurant this implements solution 2) mentioned in #750. I am happy to change it to another solution, as discussed.
I think 2) is the right choice.
Can we make a test which surfaces the ConnectionError? Perhaps just set the endpoint_url to something silly.
I will try to think of something, and also attempt to add tests for the other changes.
I created test coverage for two of the three new cases:
- A bucket that is not existent, but we have access to check (raises
FileNotFoundError, which we catch) - A
ConnectionErroras requested, symbolic for all other types of errors where we are actually unable to even check whether a bucket exists. This is what this PR is trying to fix (since currently,Falseis returned instead).
However, I cannot figure out how to create good test coverage for the third path, the PermissionError that is returned in case listing of buckets is prohibited by the endpoint. I am unsure on how to create a s3 file system locally that disallows listing of available buckets (which would be required for this test). The best I found is to trigger the permissionerror by providing the wrong credentials:
def test_exists_bucket_nonexistent_or_no_access(caplog):
# Ensure that a warning is raised and False is returned if querying a
# bucket that might not exist or might be private.
fs = s3fs.S3FileSystem(key="asdfasfsdf", secret="sadfdsfds")
assert not fs.exists("s3://this-bucket-might-not-exist/")
assert caplog.records[0].levelname == "WARNING"
assert "doesn't exist or you don't have access" in caplog.text
If this is good enough, it is already in the PR. But I am happy to improve this one.
Yes, that test is fine - it is, of course, failing on permissions with real AWS in this case (moto does not enforce credentials, which is why you couldn't test with that)
I finally managed to set up pre-commit, I ran in SSL errors before. So hopefully the PR should not fail because of that any more. I also added changes as requested. My tests pass locally but I don't have the setup to test all Python versions currently.
Merged PR but removed duplicate endpoint_url (this was raising a different error). In addition, the other test now clears warnings cache as well - this was the reason for the test failure.
Forgot to run pre-commit, this is now fixed... But I am honestly not sure where the warnings are coming from, I even installed Python 3.8 locally and ran the tests there, they run fine...
The problem is the Unclosed client session warning, but I am unable to reproduce how s3fs.exists() triggers that...
The following may remove the warnings
--- a/s3fs/core.py
+++ b/s3fs/core.py
@@ -541,6 +541,12 @@ class S3FileSystem(AsyncFileSystem):
@staticmethod
def close_session(loop, s3):
if loop is not None and loop.is_running():
+ try:
+ loop = asyncio.get_event_loop()
+ loop.create_task(s3.__aexit__(None, None, None))
+ return
+ except RuntimeError:
+ pass
try:
sync(loop, s3.__aexit__, None, None, None, timeout=0.1)
Your failures look like genuine problems.
For the client close warnings (which are not errors) https://github.com/fsspec/s3fs/pull/760 will fix this.
If we don't manage to fix the warnings issue, I could also completely remove the warning from the PR. While I think it would be benefitial to warn users, it would already be an improvement if we manage to get rid of the generic except Exception clause. Let's see whether #760 changes things.