deeplake
deeplake copied to clipboard
Do not hide S3 access errors
If e.g. the wrong AWS profile is used the download of the open datasets which are stored in S3 will fail. These errors was completely hidden and was instead displayed as if the dataset does not exists, e.g.:
hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().
This commit creates a new excpetion type which is not excepted to make it clear that it is an AWS S3 access error that is the cause.
π π Pull Request
Checklist:
- [x] My code follows the style guidelines of this project and the Contributing document
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have kept the
coverage-rateup - [x] I have performed a self-review of my own code and resolved any problems
- [x] I have checked to ensure there aren't any other open Pull Requests for the same change
- [ ] I have described and made corresponding changes to the relevant documentation
- [x] New and existing unit tests pass locally with my changes
Changes
This commit makes the error messages more useful for debugging. E.g, consider a user who has a default region setup for AWS:
#~/.aws/config
[default]
region: eu-north-1
When the user wants to try out hub they do:
python -c "import hub; hub.load('hub://activeloop/mnist-train')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/daniel/src/Hub/hub/api/dataset.py", line 407, in load
raise DatasetHandlerError(
hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().
The error message states that the dataset does not exist, this is really confusing for someone who has not used Hub before.
After this change the error will instead be:
python -c "import hub; hub.load('hub://activeloop/mnist-train')"
Traceback (most recent call last):
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 251, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/daniel/src/Hub/hub/util/keys.py", line 174, in dataset_exists
storage[get_dataset_meta_key(FIRST_COMMIT_ID)]
File "/home/daniel/src/Hub/hub/core/storage/lru_cache.py", line 189, in __getitem__
result = self.next_storage[path]
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 211, in __getitem__
return self.get_bytes(path)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 261, in get_bytes
with manager(self, new_error_cls): # type: ignore
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 77, in __exit__
raise self.error_class(exc_value).with_traceback(exc_traceback)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
hub.util.exceptions.S3GetAccessError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/daniel/src/Hub/hub/api/dataset.py", line 406, in load
if not dataset_exists(cache_chain):
File "/home/daniel/src/Hub/hub/util/keys.py", line 177, in dataset_exists
raise AuthorizationException("The dataset storage cannot be accessed") from err
hub.util.exceptions.AuthorizationException: The dataset storage cannot be accessed
Hey there @daniel-falk. Thank you so much for the comtribution! Can you please sign the Contributor Locense Agreement so we can review and merge the contribution? Also, please hit me up in slack (slack.activeloop.ai) so we can send over some swag your way. :) we really appreciate the contribution!
Thanks @mikayelh! CLA signed and you have a message in slack.
What is the status of this PR? Should I rebase it to latest mater?
Hey @daniel-falk thanks a lot for your patience. I'm very sorry it's taking so long to review this PR. Can you pls resolve conflicts and we'll review this asap. Thanks again for your the contribution!!!
Thanks for your contribution @daniel-falk! PR should be good to merge once conflicts are resolved
Actually seems like if this issue has been solved by 3a9400b67? It does not seem like if I can reproduce it anymore :+1:
...perhaps not. I can still reproduce it if I try to load a dataset from an S3 bucket and there are no credentials configured:
python -c "import deeplake; deeplake.load('s3://fixedit-dev-test/deeplake-test')"
Error in sys.excepthook:
Traceback (most recent call last):
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/humbug/report.py", line 498, in _hook
self.error_report(error=exception_instance, tags=tags, publish=publish)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/humbug/report.py", line 244, in error_report
traceback.format_exception(
TypeError: format_exception() got an unexpected keyword argument 'etype'
Original exception was:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/daniel/src/Hub/deeplake/api/dataset.py", line 426, in load
raise DatasetHandlerError(
deeplake.util.exceptions.DatasetHandlerError: A Deep Lake dataset does not exist at the given path (s3://fixedit-dev-test/deeplake-test). Check the path provided or in case you want to create a new dataset, use deeplake.empty().
this is actually triggered from:
Original exception was:
Traceback (most recent call last):
File "/home/daniel/src/Hub/deeplake/core/storage/s3.py", line 237, in get_bytes
return self._get_bytes(path, start_byte, end_byte)
File "/home/daniel/src/Hub/deeplake/core/storage/s3.py", line 210, in _get_bytes
resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/client.py", line 515, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/client.py", line 917, in _make_api_call
http, parsed_response = self._make_request(
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/client.py", line 940, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 134, in create_request
self._event_emitter.emit(
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
response = handler(**kwargs)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/signers.py", line 105, in handler
return self.sign(operation_name, request)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/signers.py", line 189, in sign
auth.add_auth(request)
File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
...but this is just swallowed somewhere.
This commit solves this specific error, but I think a larger issue is that we have a catch all except which always reraises the S3GetError(err) exception which then seems to be swallowed and replaced with the message that the dataset does not exist. This is also very confusing during development since any programming error inside the try-catch will just generate the same message about dataset not existing.
Hi @farizrahman4u and @AbhinavTuli, I did some minor changes to fix the incorrect and ignored type-hints in the s3.py file. Do you want to look again or can we merge?
It does not seem like the failing tests are related to my changes? One of the failing tests is due to the deeplake __vrsion__ string and the other seems to be failing the test when decoding images. Do you think it is issues that are solved so I should rebase to latest main?
@daniel-falk please pull main to fix backward compatibility issues.
Codecov Report
Base: 89.04% // Head: 89.59% // Increases project coverage by +0.55% :tada:
Coverage data is based on head (
eaab7f1) compared to base (215d109). Patch coverage: 18.02% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## main #1884 +/- ##
==========================================
+ Coverage 89.04% 89.59% +0.55%
==========================================
Files 253 253
Lines 27430 27844 +414
==========================================
+ Hits 24425 24947 +522
+ Misses 3005 2897 -108
| Flag | Coverage Ξ | |
|---|---|---|
| unittests | 89.59% <18.02%> (+0.55%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Ξ | |
|---|---|---|
| deeplake/enterprise/dataloader.py | 18.57% <0.00%> (+0.13%) |
:arrow_up: |
| setup.py | 0.00% <0.00%> (ΓΈ) |
|
| deeplake/enterprise/util.py | 18.07% <8.33%> (-0.85%) |
:arrow_down: |
| deeplake/enterprise/test_pytorch.py | 22.85% <16.23%> (+2.89%) |
:arrow_up: |
| deeplake/enterprise/test_query.py | 16.27% <20.00%> (+4.08%) |
:arrow_up: |
| deeplake/core/storage/s3.py | 68.02% <22.22%> (-0.99%) |
:arrow_down: |
| deeplake/core/dataset/dataset.py | 91.93% <40.00%> (-0.08%) |
:arrow_down: |
| deeplake/util/keys.py | 96.87% <75.00%> (-1.00%) |
:arrow_down: |
| deeplake/__init__.py | 94.73% <100.00%> (-0.10%) |
:arrow_down: |
| deeplake/util/exceptions.py | 85.24% <100.00%> (+0.03%) |
:arrow_up: |
| ... and 92 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.