s3fs For exists(), more precise error handling.

This PR improves the error handling logic for s3fs.S3FileSystem.exists() and fixes #750

First change is that there is no longer a generic except Exception block, meaning that e.g. ConnectionError now correctly raises an error instead of returning False when querying a bucket.

Second change is that a warning is now logged when attempting to query the existence of a bucket that the user does not have access to - this is important since the bucket might actually exist. This is in line with Amazon's proposed behavior, see here.

For more information on this, see related issue #750.

Jul 20 '23 16:07 kasuteru

@martindurant this implements solution 2) mentioned in #750. I am happy to change it to another solution, as discussed.

Jul 20 '23 16:07 kasuteru

I think 2) is the right choice.

Jul 20 '23 17:07 martindurant

Can we make a test which surfaces the ConnectionError? Perhaps just set the endpoint_url to something silly.

Jul 21 '23 14:07 martindurant

I will try to think of something, and also attempt to add tests for the other changes.

Jul 21 '23 14:07 kasuteru

I created test coverage for two of the three new cases:

A bucket that is not existent, but we have access to check (raises FileNotFoundError, which we catch)
A ConnectionError as requested, symbolic for all other types of errors where we are actually unable to even check whether a bucket exists. This is what this PR is trying to fix (since currently, False is returned instead).

However, I cannot figure out how to create good test coverage for the third path, the PermissionError that is returned in case listing of buckets is prohibited by the endpoint. I am unsure on how to create a s3 file system locally that disallows listing of available buckets (which would be required for this test). The best I found is to trigger the permissionerror by providing the wrong credentials:

def test_exists_bucket_nonexistent_or_no_access(caplog):
    # Ensure that a warning is raised and False is returned if querying a
    # bucket that might not exist or might be private.
    fs = s3fs.S3FileSystem(key="asdfasfsdf", secret="sadfdsfds")
    assert not fs.exists("s3://this-bucket-might-not-exist/")
    assert caplog.records[0].levelname == "WARNING"
    assert "doesn't exist or you don't have access" in caplog.text

If this is good enough, it is already in the PR. But I am happy to improve this one.

Jul 21 '23 15:07 kasuteru

Yes, that test is fine - it is, of course, failing on permissions with real AWS in this case (moto does not enforce credentials, which is why you couldn't test with that)

Jul 21 '23 16:07 martindurant

I finally managed to set up pre-commit, I ran in SSL errors before. So hopefully the PR should not fail because of that any more. I also added changes as requested. My tests pass locally but I don't have the setup to test all Python versions currently.

Jul 24 '23 14:07 kasuteru

Merged PR but removed duplicate endpoint_url (this was raising a different error). In addition, the other test now clears warnings cache as well - this was the reason for the test failure.

Jul 25 '23 13:07 kasuteru

Forgot to run pre-commit, this is now fixed... But I am honestly not sure where the warnings are coming from, I even installed Python 3.8 locally and ran the tests there, they run fine...

The problem is the Unclosed client session warning, but I am unable to reproduce how s3fs.exists() triggers that...

Jul 25 '23 15:07 kasuteru

The following may remove the warnings

--- a/s3fs/core.py
+++ b/s3fs/core.py
@@ -541,6 +541,12 @@ class S3FileSystem(AsyncFileSystem):
     @staticmethod
     def close_session(loop, s3):
         if loop is not None and loop.is_running():
+            try:
+                loop = asyncio.get_event_loop()
+                loop.create_task(s3.__aexit__(None, None, None))
+                return
+            except RuntimeError:
+                pass
             try:
                 sync(loop, s3.__aexit__, None, None, None, timeout=0.1)

Jul 25 '23 15:07 martindurant

Your failures look like genuine problems.

For the client close warnings (which are not errors) https://github.com/fsspec/s3fs/pull/760 will fix this.

Jul 27 '23 13:07 martindurant

If we don't manage to fix the warnings issue, I could also completely remove the warning from the PR. While I think it would be benefitial to warn users, it would already be an improvement if we manage to get rid of the generic except Exception clause. Let's see whether #760 changes things.

Jul 27 '23 13:07 kasuteru