deeplake
deeplake copied to clipboard
Do not hide S3 access errors
If e.g. the wrong AWS profile is used the download of the open datasets which are stored in S3 will fail. These errors was completely hidden and was instead displayed as if the dataset does not exists, e.g.:
hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().
This commit creates a new excpetion type which is not excepted to make it clear that it is an AWS S3 access error that is the cause.
🚀 🚀 Pull Request
Checklist:
- [x] My code follows the style guidelines of this project and the Contributing document
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have kept the
coverage-rate
up - [x] I have performed a self-review of my own code and resolved any problems
- [x] I have checked to ensure there aren't any other open Pull Requests for the same change
- [ ] I have described and made corresponding changes to the relevant documentation
- [x] New and existing unit tests pass locally with my changes
Changes
This commit makes the error messages more useful for debugging. E.g, consider a user who has a default region setup for AWS:
#~/.aws/config
[default]
region: eu-north-1
When the user wants to try out hub they do:
python -c "import hub; hub.load('hub://activeloop/mnist-train')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/daniel/src/Hub/hub/api/dataset.py", line 407, in load
raise DatasetHandlerError(
hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().
The error message states that the dataset does not exist, this is really confusing for someone who has not used Hub before.
After this change the error will instead be:
python -c "import hub; hub.load('hub://activeloop/mnist-train')"
Traceback (most recent call last):
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 251, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/daniel/src/Hub/hub/util/keys.py", line 174, in dataset_exists
storage[get_dataset_meta_key(FIRST_COMMIT_ID)]
File "/home/daniel/src/Hub/hub/core/storage/lru_cache.py", line 189, in __getitem__
result = self.next_storage[path]
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 211, in __getitem__
return self.get_bytes(path)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 261, in get_bytes
with manager(self, new_error_cls): # type: ignore
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 77, in __exit__
raise self.error_class(exc_value).with_traceback(exc_traceback)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
hub.util.exceptions.S3GetAccessError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/daniel/src/Hub/hub/api/dataset.py", line 406, in load
if not dataset_exists(cache_chain):
File "/home/daniel/src/Hub/hub/util/keys.py", line 177, in dataset_exists
raise AuthorizationException("The dataset storage cannot be accessed") from err
hub.util.exceptions.AuthorizationException: The dataset storage cannot be accessed
Hey there @daniel-falk. Thank you so much for the comtribution! Can you please sign the Contributor Locense Agreement so we can review and merge the contribution? Also, please hit me up in slack (slack.activeloop.ai) so we can send over some swag your way. :) we really appreciate the contribution!
Thanks @mikayelh! CLA signed and you have a message in slack.