deeplake icon indicating copy to clipboard operation
deeplake copied to clipboard

Do not hide S3 access errors

Open daniel-falk opened this issue 3 years ago β€’ 3 comments

If e.g. the wrong AWS profile is used the download of the open datasets which are stored in S3 will fail. These errors was completely hidden and was instead displayed as if the dataset does not exists, e.g.:

hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().

This commit creates a new excpetion type which is not excepted to make it clear that it is an AWS S3 access error that is the cause.

πŸš€ πŸš€ Pull Request

Checklist:

  • [x] My code follows the style guidelines of this project and the Contributing document
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have kept the coverage-rate up
  • [x] I have performed a self-review of my own code and resolved any problems
  • [x] I have checked to ensure there aren't any other open Pull Requests for the same change
  • [ ] I have described and made corresponding changes to the relevant documentation
  • [x] New and existing unit tests pass locally with my changes

Changes

This commit makes the error messages more useful for debugging. E.g, consider a user who has a default region setup for AWS:

#~/.aws/config
[default]
region: eu-north-1

When the user wants to try out hub they do:

python -c "import hub; hub.load('hub://activeloop/mnist-train')"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/daniel/src/Hub/hub/api/dataset.py", line 407, in load
    raise DatasetHandlerError(
hub.util.exceptions.DatasetHandlerError: A Hub dataset does not exist at the given path (hub://activeloop/mnist-train). Check the path provided or in case you want to create a new dataset, use hub.empty().

The error message states that the dataset does not exist, this is really confusing for someone who has not used Hub before.

After this change the error will instead be:

python -c "import hub; hub.load('hub://activeloop/mnist-train')"

Traceback (most recent call last):                                                                                                           
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 251, in get_bytes
    return self._get_bytes(path, start_byte, end_byte)               
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)                                                                                       
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
                                                                      
During handling of the above exception, another exception occurred:                                                                                                                                                                                                                       
                                                                      
Traceback (most recent call last):
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes                                                                                                                                                                                                              
    return self._get_bytes(path, start_byte, end_byte)                                                                                       
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes                                                                                                                                                                                                             
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
                                                                      
During handling of the above exception, another exception occurred:
                                                                                                                                                                                                                                                                                          
Traceback (most recent call last):                                                                                                                                                                                                                                                        
  File "/home/daniel/src/Hub/hub/util/keys.py", line 174, in dataset_exists
    storage[get_dataset_meta_key(FIRST_COMMIT_ID)]
  File "/home/daniel/src/Hub/hub/core/storage/lru_cache.py", line 189, in __getitem__
    result = self.next_storage[path]
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 211, in __getitem__
    return self.get_bytes(path)
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 261, in get_bytes
    with manager(self, new_error_cls):  # type: ignore
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 77, in __exit__
    raise self.error_class(exc_value).with_traceback(exc_traceback)
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 262, in get_bytes
    return self._get_bytes(path, start_byte, end_byte)
  File "/home/daniel/src/Hub/hub/core/storage/s3.py", line 224, in _get_bytes
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/daniel/workspace/shared_source/tflite-deep-learning-axis-camera/venv/lib/python3.10/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
hub.util.exceptions.S3GetAccessError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/daniel/src/Hub/hub/api/dataset.py", line 406, in load
    if not dataset_exists(cache_chain):
  File "/home/daniel/src/Hub/hub/util/keys.py", line 177, in dataset_exists
    raise AuthorizationException("The dataset storage cannot be accessed") from err
hub.util.exceptions.AuthorizationException: The dataset storage cannot be accessed

daniel-falk avatar Sep 18 '22 09:09 daniel-falk

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 18 '22 09:09 CLAassistant

Hey there @daniel-falk. Thank you so much for the comtribution! Can you please sign the Contributor Locense Agreement so we can review and merge the contribution? Also, please hit me up in slack (slack.activeloop.ai) so we can send over some swag your way. :) we really appreciate the contribution!

mikayelh avatar Sep 18 '22 09:09 mikayelh

Thanks @mikayelh! CLA signed and you have a message in slack.

daniel-falk avatar Sep 18 '22 10:09 daniel-falk

What is the status of this PR? Should I rebase it to latest mater?

daniel-falk avatar Nov 10 '22 14:11 daniel-falk

Hey @daniel-falk thanks a lot for your patience. I'm very sorry it's taking so long to review this PR. Can you pls resolve conflicts and we'll review this asap. Thanks again for your the contribution!!!

tatevikh avatar Nov 10 '22 14:11 tatevikh

Thanks for your contribution @daniel-falk! PR should be good to merge once conflicts are resolved

AbhinavTuli avatar Nov 10 '22 15:11 AbhinavTuli

Actually seems like if this issue has been solved by 3a9400b67? It does not seem like if I can reproduce it anymore :+1:

daniel-falk avatar Nov 10 '22 18:11 daniel-falk

...perhaps not. I can still reproduce it if I try to load a dataset from an S3 bucket and there are no credentials configured:

python -c "import deeplake; deeplake.load('s3://fixedit-dev-test/deeplake-test')"
Error in sys.excepthook:
Traceback (most recent call last):
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/humbug/report.py", line 498, in _hook
    self.error_report(error=exception_instance, tags=tags, publish=publish)
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/humbug/report.py", line 244, in error_report
    traceback.format_exception(
TypeError: format_exception() got an unexpected keyword argument 'etype'

Original exception was:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/daniel/src/Hub/deeplake/api/dataset.py", line 426, in load
    raise DatasetHandlerError(
deeplake.util.exceptions.DatasetHandlerError: A Deep Lake dataset does not exist at the given path (s3://fixedit-dev-test/deeplake-test). Check the path provided or in case you want to create a new dataset, use deeplake.empty().

this is actually triggered from:

Original exception was:                                                                                                                                                                                                                                                                   
Traceback (most recent call last):                                                                                                                                                                                                                                                        
  File "/home/daniel/src/Hub/deeplake/core/storage/s3.py", line 237, in get_bytes                                                                                                                                                                                                         
    return self._get_bytes(path, start_byte, end_byte)                                                                                                                                                                                                                                    
  File "/home/daniel/src/Hub/deeplake/core/storage/s3.py", line 210, in _get_bytes                                                                                                                                                                                                        
    resp = self.client.get_object(Bucket=self.bucket, Key=path, Range=range)                                                                                                                                                                                                              
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/client.py", line 515, in _api_call                                                                                                                                                                                
    return self._make_api_call(operation_name, kwargs)                                                                                                                                                                                                                                    
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/client.py", line 917, in _make_api_call                                                                                                                                                                           
    http, parsed_response = self._make_request(                                                                                                                                                                                                                                           
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/client.py", line 940, in _make_request                                                                                                                                                                            
    return self._endpoint.make_request(operation_model, request_dict)                                                                                                                                                                                                                     
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request                                                                                                                                                                           
    return self._send_request(request_dict, operation_model)                                                                                                                                                                                                                              
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 198, in _send_request                                                                                                                                                                          
    request = self.create_request(request_dict, operation_model)                                                                                                                                                                                                                          
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/endpoint.py", line 134, in create_request                                                                                                                                                                         
    self._event_emitter.emit(                                                                                                                                                                                                                                                             
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit                                                                                                                                                                                      
    return self._emitter.emit(aliased_event_name, **kwargs)                                                                                                                                                                                                                               
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit                                                                                                                                                                                      
    return self._emit(event_name, kwargs)                                                                                                                                                                                                                                                 
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit                                                                                                                                                                                     
    response = handler(**kwargs)                                                                                                                                                                                                                                                          
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/signers.py", line 105, in handler                                                                                                                                                                                 
    return self.sign(operation_name, request)                                                                                                                                                                                                                                             
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/signers.py", line 189, in sign                                                                                                                                                                                    
    auth.add_auth(request)                                                                                                                                                                                                                                                                
  File "/home/daniel/src/Hub/venv/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth                                                                                                                                                                                   
    raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

...but this is just swallowed somewhere.

daniel-falk avatar Nov 10 '22 19:11 daniel-falk

This commit solves this specific error, but I think a larger issue is that we have a catch all except which always reraises the S3GetError(err) exception which then seems to be swallowed and replaced with the message that the dataset does not exist. This is also very confusing during development since any programming error inside the try-catch will just generate the same message about dataset not existing.

daniel-falk avatar Nov 10 '22 19:11 daniel-falk

Hi @farizrahman4u and @AbhinavTuli, I did some minor changes to fix the incorrect and ignored type-hints in the s3.py file. Do you want to look again or can we merge?

daniel-falk avatar Nov 13 '22 10:11 daniel-falk

It does not seem like the failing tests are related to my changes? One of the failing tests is due to the deeplake __vrsion__ string and the other seems to be failing the test when decoding images. Do you think it is issues that are solved so I should rebase to latest main?

daniel-falk avatar Nov 17 '22 19:11 daniel-falk

@daniel-falk please pull main to fix backward compatibility issues.

farizrahman4u avatar Nov 18 '22 10:11 farizrahman4u

Codecov Report

Base: 89.04% // Head: 89.59% // Increases project coverage by +0.55% :tada:

Coverage data is based on head (eaab7f1) compared to base (215d109). Patch coverage: 18.02% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1884      +/-   ##
==========================================
+ Coverage   89.04%   89.59%   +0.55%     
==========================================
  Files         253      253              
  Lines       27430    27844     +414     
==========================================
+ Hits        24425    24947     +522     
+ Misses       3005     2897     -108     
Flag Coverage Ξ”
unittests 89.59% <18.02%> (+0.55%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Ξ”
deeplake/enterprise/dataloader.py 18.57% <0.00%> (+0.13%) :arrow_up:
setup.py 0.00% <0.00%> (ΓΈ)
deeplake/enterprise/util.py 18.07% <8.33%> (-0.85%) :arrow_down:
deeplake/enterprise/test_pytorch.py 22.85% <16.23%> (+2.89%) :arrow_up:
deeplake/enterprise/test_query.py 16.27% <20.00%> (+4.08%) :arrow_up:
deeplake/core/storage/s3.py 68.02% <22.22%> (-0.99%) :arrow_down:
deeplake/core/dataset/dataset.py 91.93% <40.00%> (-0.08%) :arrow_down:
deeplake/util/keys.py 96.87% <75.00%> (-1.00%) :arrow_down:
deeplake/__init__.py 94.73% <100.00%> (-0.10%) :arrow_down:
deeplake/util/exceptions.py 85.24% <100.00%> (+0.03%) :arrow_up:
... and 92 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Nov 23 '22 15:11 codecov[bot]