deeplake icon indicating copy to clipboard operation
deeplake copied to clipboard

Do not hide S3 access errors

Open daniel-falk opened this issue 1 year ago • 3 comments

If e.g. the wrong AWS profile is used the download of the open datasets which are stored in S3 will fail. These errors was completely hidden and was instead displayed as if the dataset does not exists, e.g.:

hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().

This commit creates a new excpetion type which is not excepted to make it clear that it is an AWS S3 access error that is the cause.

🚀 🚀 Pull Request

Checklist:

  • [x] My code follows the style guidelines of this project and the Contributing document
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have kept the coverage-rate up
  • [x] I have performed a self-review of my own code and resolved any problems
  • [x] I have checked to ensure there aren't any other open Pull Requests for the same change
  • [ ] I have described and made corresponding changes to the relevant documentation
  • [x] New and existing unit tests pass locally with my changes

Changes

This commit makes the error messages more useful for debugging. E.g, consider a user who has a default region setup for AWS:

#~/.aws/config
[default]
region: eu-north-1

When the user wants to try out hub they do:

python -c "import hub; hub.load('hub://activeloop/mnist-train')"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/daniel/src/Hub/hub/api/dataset.py", line 407, in load
    raise DatasetHandlerError(
hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().

The error message states that the dataset does not exist, this is really confusing for someone who has not used Hub before.

After this change the error will instead be:

python -c "import hub; hub.load('hub://activeloop/mnist-train')"

Traceback (most recent call last):                                                                                                           
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 251, in get_bytes
    return self._get_bytes(path, start_byte, end_byte)               
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)                                                                                       
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
                                                                      
During handling of the above exception, another exception occurred:                                                                                                                                                                                                                       
                                                                      
Traceback (most recent call last):
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes                                                                                                                                                                                                              
    return self._get_bytes(path, start_byte, end_byte)                                                                                       
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes                                                                                                                                                                                                             
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
                                                                      
During handling of the above exception, another exception occurred:
                                                                                                                                                                                                                                                                                          
Traceback (most recent call last):                                                                                                                                                                                                                                                        
  File "/home/daniel/src/Hub/hub/util/keys.py", line 174, in dataset_exists
    storage[get_dataset_meta_key(FIRST_COMMIT_ID)]
  File "/home/daniel/src/Hub/hub/core/storage/lru_cache.py", line 189, in __getitem__
    result = self.next_storage[path]
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 211, in __getitem__
    return self.get_bytes(path)
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 261, in get_bytes
    with manager(self, new_error_cls):  # type: ignore
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 77, in __exit__
    raise self.error_class(exc_value).with_traceback(exc_traceback)
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes
    return self._get_bytes(path, start_byte, end_byte)
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
hub.util.exceptions.S3GetAccessError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/daniel/src/Hub/hub/api/dataset.py", line 406, in load
    if not dataset_exists(cache_chain):
  File "/home/daniel/src/Hub/hub/util/keys.py", line 177, in dataset_exists
    raise AuthorizationException("The dataset storage cannot be accessed") from err
hub.util.exceptions.AuthorizationException: The dataset storage cannot be accessed

daniel-falk avatar Sep 18 '22 09:09 daniel-falk

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 18 '22 09:09 CLAassistant

Hey there @daniel-falk. Thank you so much for the comtribution! Can you please sign the Contributor Locense Agreement so we can review and merge the contribution? Also, please hit me up in slack (slack.activeloop.ai) so we can send over some swag your way. :) we really appreciate the contribution!

mikayelh avatar Sep 18 '22 09:09 mikayelh

Thanks @mikayelh! CLA signed and you have a message in slack.

daniel-falk avatar Sep 18 '22 10:09 daniel-falk