aws-sdk-pandas
aws-sdk-pandas copied to clipboard
`NoCredentialsError: Unable to locate credentials` with `s3.describe_objects` and a valid `boto3_session` argument
Describe the bug
When passed a valid boto3.Session, s3.describe_objects is able to describe one object but not a list of objects, whereas it is supposed to be supported by the library (here)
How to Reproduce
In all cases, a valid session is provided to the function
>>> wr.s3.list_objects(path="s3://clement-test-1", boto3_session=session)
['s3://clement-test-1/folder2/10.pdf',
's3://clement-test-1/folder2/11.pdf',
's3://clement-test-1/folder2/12.pdf',
's3://clement-test-1/folder2/13.pdf',
's3://clement-test-1/folder2/subfolder1/10.pdf',
's3://clement-test-1/folder2/subfolder1/11.pdf',
's3://clement-test-1/folder2/subfolder1/12.pdf',
's3://clement-test-1/folder2/subfolder1/13.pdf']
This query works, the session is well defined (the bucket is private).
When I try to describe one of these objects:
>>> wr.s3.describe_objects(path='s3://clement-test-1/folder2/subfolder1/10.pdf', boto3_session=session)
{'s3://clement-test-1/folder2/subfolder1/10.pdf': {'ResponseMetadata': {'RequestId': 'xxxxxxx',
'HostId': 'xxxxxxx',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amz-id-2': 'xxxxxxx',
'x-amz-request-id': 'xxxxxxxxxxxxxx',
'date': 'Tue, 02 Aug 2022 10:56:33 GMT',
'last-modified': 'Tue, 21 Jun 2022 12:03:31 GMT',
'etag': '"xxxxxxxxxxxxxx"',
'accept-ranges': 'bytes',
'content-type': 'application/pdf',
'server': 'AmazonS3',
'content-length': '14749033'},
'RetryAttempts': 0},
'AcceptRanges': 'bytes',
'LastModified': datetime.datetime(2022, 6, 21, 12, 3, 31, tzinfo=tzutc()),
'ContentLength': 14749033,
'ETag': '"xxxxxxxxxxxxxx"',
'ContentType': 'application/pdf',
'Metadata': {}}}
But when I try to use a list of arguments for path (it is supported according to the documentation ), a NoCredentialError is raised, whereas the session is valid (as it worked for the above calls), and the files exist on the bucket
>>> wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)
---------------------------------------------------------------------------
NoCredentialsError Traceback (most recent call last)
/var/folders/y8/fqhzmbr93t1g76sjf_vschr80000gn/T/ipykernel_5709/2496337718.py in <cell line: 1>()
----> 1 wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)
with this stack trace:
Stack trace
---------------------------------------------------------------------------
NoCredentialsError Traceback (most recent call last)
/var/folders/y8/fqhzmbr93t1g76sjf_vschr80000gn/T/ipykernel_5709/2496337718.py in <cell line: 1>()
----> 1 wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in describe_objects(path, version_id, use_threads, last_modified_begin, last_modified_end, s3_additional_kwargs, boto3_session)
154 versions = [version_id.get(p) if isinstance(version_id, dict) else version_id for p in paths]
155 with concurrent.futures.ThreadPoolExecutor(max_workers=cpus) as executor:
--> 156 resp_list = list(
157 executor.map(
158 _describe_object_concurrent,
/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in result_iterator()
607 # Careful not to keep a reference to the popped future
608 if timeout is None:
--> 609 yield fs.pop().result()
610 else:
611 yield fs.pop().result(end_time - time.monotonic())
/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
444 raise CancelledError()
445 elif self._state == FINISHED:
--> 446 return self.__get_result()
447 else:
448 raise TimeoutError()
/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
389 if self._exception:
390 try:
--> 391 raise self._exception
392 finally:
393 # Break a reference cycle with the exception in self._exception
/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py in run(self)
56
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in _describe_object_concurrent(path, boto3_primitives, s3_additional_kwargs, version_id)
48 ) -> Tuple[str, Dict[str, Any]]:
49 boto3_session = _utils.boto3_from_primitives(primitives=boto3_primitives)
---> 50 return _describe_object(
51 path=path, boto3_session=boto3_session, s3_additional_kwargs=s3_additional_kwargs, version_id=version_id
52 )
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in _describe_object(path, boto3_session, s3_additional_kwargs, version_id)
35 if version_id:
36 extra_kwargs["VersionId"] = version_id
---> 37 desc = _utils.try_it(
38 f=client_s3.head_object, ex=client_s3.exceptions.NoSuchKey, Bucket=bucket, Key=key, **extra_kwargs
39 )
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/_utils.py in try_it(f, ex, ex_code, base, max_num_tries, **kwargs)
341 for i in range(max_num_tries):
342 try:
--> 343 return f(**kwargs)
344 except ex as exception:
345 if ex_code is not None and hasattr(exception, "response"):
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
506 )
507 # The "self" in this scope is referring to the BaseClient.
--> 508 return self._make_api_call(operation_name, kwargs)
509
510 _api_call.__name__ = str(py_operation_name)
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
896 else:
897 apply_request_checksum(request_dict)
--> 898 http, parsed_response = self._make_request(
899 operation_model, request_dict, request_context
900 )
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _make_request(self, operation_model, request_dict, request_context)
919 def _make_request(self, operation_model, request_dict, request_context):
920 try:
--> 921 return self._endpoint.make_request(operation_model, request_dict)
922 except Exception as e:
923 self.meta.events.emit(
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in make_request(self, operation_model, request_dict)
117 request_dict,
118 )
--> 119 return self._send_request(request_dict, operation_model)
120
121 def create_request(self, params, operation_model=None):
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in _send_request(self, request_dict, operation_model)
196 context = request_dict['context']
197 self._update_retries_context(context, attempts)
--> 198 request = self.create_request(request_dict, operation_model)
199 success_response, exception = self._get_response(
200 request, operation_model, context
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in create_request(self, params, operation_model)
132 service_id=service_id, op_name=operation_model.name
133 )
--> 134 self._event_emitter.emit(
135 event_name,
136 request=request,
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs)
410 def emit(self, event_name, **kwargs):
411 aliased_event_name = self._alias_event_name(event_name)
--> 412 return self._emitter.emit(aliased_event_name, **kwargs)
413
414 def emit_until_response(self, event_name, **kwargs):
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs)
254 handlers.
255 """
--> 256 return self._emit(event_name, kwargs)
257
258 def emit_until_response(self, event_name, **kwargs):
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in _emit(self, event_name, kwargs, stop_on_response)
237 for handler in handlers_to_call:
238 logger.debug('Event %s: calling handler %s', event_name, handler)
--> 239 response = handler(**kwargs)
240 responses.append((handler, response))
241 if stop_on_response and response is not None:
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/signers.py in handler(self, operation_name, request, **kwargs)
101 # this method is invoked to sign the request.
102 # Don't call this method directly.
--> 103 return self.sign(operation_name, request)
104
105 def sign(
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/signers.py in sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name)
185 raise e
186
--> 187 auth.add_auth(request)
188
189 def _choose_signer(self, operation_name, signing_type, context):
~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/auth.py in add_auth(self, request)
405 def add_auth(self, request):
406 if self.credentials is None:
--> 407 raise NoCredentialsError()
408 datetime_now = datetime.datetime.utcnow()
409 request.context['timestamp'] = datetime_now.strftime(SIGV4_TIMESTAMP)
NoCredentialsError: Unable to locate credentials
Expected behavior
I would expect a list of metadata JSON to be returned by the function (and most importantly the credentials in the boto3.Session to be correctly located, as in the single-file case)
Your project
No response
Screenshots
No response
OS
macOS
Python version
3.9.13
AWS DataWrangler version
2.16.1
Additional context
No response
Thanks for opening @ClementSicard , I will attempt to replicate and get back to you soon.
Hmm I am unable to replicate @ClementSicard
>>> import boto3
>>> import awswrangler as wr
>>> wr.__version__
'2.16.1'
>>> my_session = boto3.session.Session()
>>> result = wr.s3.list_objects(path, boto3_session=my_session)
>>> wr.s3.describe_objects(path=result[1:3], boto3_session=my_session)
{'s3://hansonlu-test-data-bucket/csv/file1.csv': {'ResponseMetadata': {'RequestId': 'C99Y0HBTE8VKW090', 'HostId': 'LUCEVRUCek4xLT7IXiCbOlYierDdcbQGwTBc4IlQmX+7OZuLPUPMpKrJcfJtSzELBlxMDyvqQj0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'LUCEVRUCek4xLT7IXiCbOlYierDdcbQGwTBc4IlQmX+7OZuLPUPMpKrJcfJtSzELBlxMDyvqQj0=', 'x-amz-request-id': 'C99Y0HBTE8VKW090', 'date': 'Mon, 08 Aug 2022 17:57:09 GMT', 'last-modified': 'Thu, 21 Apr 2022 23:07:46 GMT', 'etag': '"3fc4883f513a6ce7a3487e521e58de92"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '20'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 4, 21, 23, 7, 46, tzinfo=tzutc()), 'ContentLength': 20, 'ETag': '"3fc4883f513a6ce7a3487e521e58de92"', 'ContentType': 'binary/octet-stream', 'ServerSideEncryption': 'AES256', 'Metadata': {}}, 's3://hansonlu-test-data-bucket/csv/file2.csv': {'ResponseMetadata': {'RequestId': 'C99MMKYCREFXS20S', 'HostId': 'zel3k5GK/lumbfwkOBj1D3JaBM5xycn66jmICeqKS3U0gurmOIjLID5C6wbuXZ2lMY/MZYcp6e0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'zel3k5GK/lumbfwkOBj1D3JaBM5xycn66jmICeqKS3U0gurmOIjLID5C6wbuXZ2lMY/MZYcp6e0=', 'x-amz-request-id': 'C99MMKYCREFXS20S', 'date': 'Mon, 08 Aug 2022 17:57:09 GMT', 'last-modified': 'Thu, 21 Apr 2022 23:07:48 GMT', 'etag': '"13e27af06c955d43b12da432b839b204"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '14'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 4, 21, 23, 7, 48, tzinfo=tzutc()), 'ContentLength': 14, 'ETag': '"13e27af06c955d43b12da432b839b204"', 'ContentType': 'binary/octet-stream', 'ServerSideEncryption': 'AES256', 'Metadata': {}}}
Is there any specific configuration in your session object I can test?
Closing for now as bug cannot be replicated. Please reopen if this issue is persistent and more context can be provided.
I am experiencing a similar issue as reported above with the awswrangler.s3.describe_objects() method and a valid boto3 session.
Method works just file when a single string with path to a single s3 object is passed in, however, when a path that's upstream to multiple s3 objects, or a list of paths is passed in for the path arg, this error is retrieved:
NoCredentialsError: Unable to locate credentials