aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

`NoCredentialsError: Unable to locate credentials` with `s3.describe_objects` and a valid `boto3_session` argument

Open ClementSicard opened this issue 3 years ago • 2 comments

Describe the bug

When passed a valid boto3.Session, s3.describe_objects is able to describe one object but not a list of objects, whereas it is supposed to be supported by the library (here)

How to Reproduce

In all cases, a valid session is provided to the function

>>> wr.s3.list_objects(path="s3://clement-test-1", boto3_session=session)

['s3://clement-test-1/folder2/10.pdf',
 's3://clement-test-1/folder2/11.pdf',
 's3://clement-test-1/folder2/12.pdf',
 's3://clement-test-1/folder2/13.pdf',
 's3://clement-test-1/folder2/subfolder1/10.pdf',
 's3://clement-test-1/folder2/subfolder1/11.pdf',
 's3://clement-test-1/folder2/subfolder1/12.pdf',
 's3://clement-test-1/folder2/subfolder1/13.pdf']

This query works, the session is well defined (the bucket is private).

When I try to describe one of these objects:

>>> wr.s3.describe_objects(path='s3://clement-test-1/folder2/subfolder1/10.pdf', boto3_session=session)

{'s3://clement-test-1/folder2/subfolder1/10.pdf': {'ResponseMetadata': {'RequestId': 'xxxxxxx',
   'HostId': 'xxxxxxx',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'xxxxxxx',
    'x-amz-request-id': 'xxxxxxxxxxxxxx',
    'date': 'Tue, 02 Aug 2022 10:56:33 GMT',
    'last-modified': 'Tue, 21 Jun 2022 12:03:31 GMT',
    'etag': '"xxxxxxxxxxxxxx"',
    'accept-ranges': 'bytes',
    'content-type': 'application/pdf',
    'server': 'AmazonS3',
    'content-length': '14749033'},
   'RetryAttempts': 0},
  'AcceptRanges': 'bytes',
  'LastModified': datetime.datetime(2022, 6, 21, 12, 3, 31, tzinfo=tzutc()),
  'ContentLength': 14749033,
  'ETag': '"xxxxxxxxxxxxxx"',
  'ContentType': 'application/pdf',
  'Metadata': {}}}

But when I try to use a list of arguments for path (it is supported according to the documentation ), a NoCredentialError is raised, whereas the session is valid (as it worked for the above calls), and the files exist on the bucket

>>> wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)

---------------------------------------------------------------------------
NoCredentialsError                        Traceback (most recent call last)
/var/folders/y8/fqhzmbr93t1g76sjf_vschr80000gn/T/ipykernel_5709/2496337718.py in <cell line: 1>()
----> 1 wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)

with this stack trace:

Stack trace
---------------------------------------------------------------------------
NoCredentialsError                        Traceback (most recent call last)
/var/folders/y8/fqhzmbr93t1g76sjf_vschr80000gn/T/ipykernel_5709/2496337718.py in <cell line: 1>()
----> 1 wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in describe_objects(path, version_id, use_threads, last_modified_begin, last_modified_end, s3_additional_kwargs, boto3_session)
    154         versions = [version_id.get(p) if isinstance(version_id, dict) else version_id for p in paths]
    155         with concurrent.futures.ThreadPoolExecutor(max_workers=cpus) as executor:
--> 156             resp_list = list(
    157                 executor.map(
    158                     _describe_object_concurrent,

/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in result_iterator()
    607                     # Careful not to keep a reference to the popped future
    608                     if timeout is None:
--> 609                         yield fs.pop().result()
    610                     else:
    611                         yield fs.pop().result(end_time - time.monotonic())

/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
    444                     raise CancelledError()
    445                 elif self._state == FINISHED:
--> 446                     return self.__get_result()
    447                 else:
    448                     raise TimeoutError()

/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
    389         if self._exception:
    390             try:
--> 391                 raise self._exception
    392             finally:
    393                 # Break a reference cycle with the exception in self._exception

/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py in run(self)
     56 
     57         try:
---> 58             result = self.fn(*self.args, **self.kwargs)
     59         except BaseException as exc:
     60             self.future.set_exception(exc)

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in _describe_object_concurrent(path, boto3_primitives, s3_additional_kwargs, version_id)
     48 ) -> Tuple[str, Dict[str, Any]]:
     49     boto3_session = _utils.boto3_from_primitives(primitives=boto3_primitives)
---> 50     return _describe_object(
     51         path=path, boto3_session=boto3_session, s3_additional_kwargs=s3_additional_kwargs, version_id=version_id
     52     )

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in _describe_object(path, boto3_session, s3_additional_kwargs, version_id)
     35     if version_id:
     36         extra_kwargs["VersionId"] = version_id
---> 37     desc = _utils.try_it(
     38         f=client_s3.head_object, ex=client_s3.exceptions.NoSuchKey, Bucket=bucket, Key=key, **extra_kwargs
     39     )

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/_utils.py in try_it(f, ex, ex_code, base, max_num_tries, **kwargs)
    341     for i in range(max_num_tries):
    342         try:
--> 343             return f(**kwargs)
    344         except ex as exception:
    345             if ex_code is not None and hasattr(exception, "response"):

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    506                 )
    507             # The "self" in this scope is referring to the BaseClient.
--> 508             return self._make_api_call(operation_name, kwargs)
    509 
    510         _api_call.__name__ = str(py_operation_name)

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    896         else:
    897             apply_request_checksum(request_dict)
--> 898             http, parsed_response = self._make_request(
    899                 operation_model, request_dict, request_context
    900             )

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _make_request(self, operation_model, request_dict, request_context)
    919     def _make_request(self, operation_model, request_dict, request_context):
    920         try:
--> 921             return self._endpoint.make_request(operation_model, request_dict)
    922         except Exception as e:
    923             self.meta.events.emit(

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in make_request(self, operation_model, request_dict)
    117             request_dict,
    118         )
--> 119         return self._send_request(request_dict, operation_model)
    120 
    121     def create_request(self, params, operation_model=None):

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in _send_request(self, request_dict, operation_model)
    196         context = request_dict['context']
    197         self._update_retries_context(context, attempts)
--> 198         request = self.create_request(request_dict, operation_model)
    199         success_response, exception = self._get_response(
    200             request, operation_model, context

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in create_request(self, params, operation_model)
    132                 service_id=service_id, op_name=operation_model.name
    133             )
--> 134             self._event_emitter.emit(
    135                 event_name,
    136                 request=request,

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs)
    410     def emit(self, event_name, **kwargs):
    411         aliased_event_name = self._alias_event_name(event_name)
--> 412         return self._emitter.emit(aliased_event_name, **kwargs)
    413 
    414     def emit_until_response(self, event_name, **kwargs):

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs)
    254                  handlers.
    255         """
--> 256         return self._emit(event_name, kwargs)
    257 
    258     def emit_until_response(self, event_name, **kwargs):

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in _emit(self, event_name, kwargs, stop_on_response)
    237         for handler in handlers_to_call:
    238             logger.debug('Event %s: calling handler %s', event_name, handler)
--> 239             response = handler(**kwargs)
    240             responses.append((handler, response))
    241             if stop_on_response and response is not None:

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/signers.py in handler(self, operation_name, request, **kwargs)
    101         # this method is invoked to sign the request.
    102         # Don't call this method directly.
--> 103         return self.sign(operation_name, request)
    104 
    105     def sign(

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/signers.py in sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name)
    185                     raise e
    186 
--> 187             auth.add_auth(request)
    188 
    189     def _choose_signer(self, operation_name, signing_type, context):

~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/auth.py in add_auth(self, request)
    405     def add_auth(self, request):
    406         if self.credentials is None:
--> 407             raise NoCredentialsError()
    408         datetime_now = datetime.datetime.utcnow()
    409         request.context['timestamp'] = datetime_now.strftime(SIGV4_TIMESTAMP)

NoCredentialsError: Unable to locate credentials

Expected behavior

I would expect a list of metadata JSON to be returned by the function (and most importantly the credentials in the boto3.Session to be correctly located, as in the single-file case)

Your project

No response

Screenshots

No response

OS

macOS

Python version

3.9.13

AWS DataWrangler version

2.16.1

Additional context

No response

ClementSicard avatar Aug 02 '22 11:08 ClementSicard

Thanks for opening @ClementSicard , I will attempt to replicate and get back to you soon.

malachi-constant avatar Aug 07 '22 01:08 malachi-constant

Hmm I am unable to replicate @ClementSicard

>>> import boto3
>>> import awswrangler as wr
>>> wr.__version__
'2.16.1'
>>> my_session = boto3.session.Session()
>>> result = wr.s3.list_objects(path, boto3_session=my_session)
>>> wr.s3.describe_objects(path=result[1:3], boto3_session=my_session)
{'s3://hansonlu-test-data-bucket/csv/file1.csv': {'ResponseMetadata': {'RequestId': 'C99Y0HBTE8VKW090', 'HostId': 'LUCEVRUCek4xLT7IXiCbOlYierDdcbQGwTBc4IlQmX+7OZuLPUPMpKrJcfJtSzELBlxMDyvqQj0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'LUCEVRUCek4xLT7IXiCbOlYierDdcbQGwTBc4IlQmX+7OZuLPUPMpKrJcfJtSzELBlxMDyvqQj0=', 'x-amz-request-id': 'C99Y0HBTE8VKW090', 'date': 'Mon, 08 Aug 2022 17:57:09 GMT', 'last-modified': 'Thu, 21 Apr 2022 23:07:46 GMT', 'etag': '"3fc4883f513a6ce7a3487e521e58de92"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '20'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 4, 21, 23, 7, 46, tzinfo=tzutc()), 'ContentLength': 20, 'ETag': '"3fc4883f513a6ce7a3487e521e58de92"', 'ContentType': 'binary/octet-stream', 'ServerSideEncryption': 'AES256', 'Metadata': {}}, 's3://hansonlu-test-data-bucket/csv/file2.csv': {'ResponseMetadata': {'RequestId': 'C99MMKYCREFXS20S', 'HostId': 'zel3k5GK/lumbfwkOBj1D3JaBM5xycn66jmICeqKS3U0gurmOIjLID5C6wbuXZ2lMY/MZYcp6e0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'zel3k5GK/lumbfwkOBj1D3JaBM5xycn66jmICeqKS3U0gurmOIjLID5C6wbuXZ2lMY/MZYcp6e0=', 'x-amz-request-id': 'C99MMKYCREFXS20S', 'date': 'Mon, 08 Aug 2022 17:57:09 GMT', 'last-modified': 'Thu, 21 Apr 2022 23:07:48 GMT', 'etag': '"13e27af06c955d43b12da432b839b204"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '14'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 4, 21, 23, 7, 48, tzinfo=tzutc()), 'ContentLength': 14, 'ETag': '"13e27af06c955d43b12da432b839b204"', 'ContentType': 'binary/octet-stream', 'ServerSideEncryption': 'AES256', 'Metadata': {}}}

Is there any specific configuration in your session object I can test?

malachi-constant avatar Aug 08 '22 17:08 malachi-constant

Closing for now as bug cannot be replicated. Please reopen if this issue is persistent and more context can be provided.

malachi-constant avatar Aug 16 '22 15:08 malachi-constant

I am experiencing a similar issue as reported above with the awswrangler.s3.describe_objects() method and a valid boto3 session.

Method works just file when a single string with path to a single s3 object is passed in, however, when a path that's upstream to multiple s3 objects, or a list of paths is passed in for the path arg, this error is retrieved:

NoCredentialsError: Unable to locate credentials

ataghavey avatar Dec 05 '22 17:12 ataghavey