boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Clients and resources should be cached

Open benkehoe opened this issue 2 years ago • 10 comments

When creating a vanilla client like session = boto3.Session(); s3 = session.client("s3"), the client returned is not cached. I can't see any reason why this client would be different between different calls, and the time to create a client can be quite long. Especially given that it's generally better to pass a session around than the clients directly, it'd be nice if the client was cached on first load.

It wouldn't use caching if credentials or a Config object were passed, and it could maybe even have a single cache entry per service. Maybe it could be better solved with more caching in botocore.

It's accomplishable today with the standard library's functools, though with a "developer beware" label.

import functools
import boto3

boto3.Session.client = functools.cache(boto3.Session.client)
boto3.Session.resource = functools.cache(boto3.Session.resource)

benkehoe avatar Mar 21 '22 15:03 benkehoe

Thanks @benkehoe for the feature request. I brought this up for discussion with the team and they had some concerns around the implementation and backwards compatibility. This could potentially be an opt-in feature. But it is worth discussing more the proposed benefits and use cases.

tim-finnigan avatar Mar 23 '22 17:03 tim-finnigan

I think my ideal opt-in interface might be like

session = boto3.Session(caching=True) # maybe allow an option for a cache object to be provided

client1 = session.client("s3")
client2 = session.client("s3")
assert client2 is client1

# allow the cache to be circumvented if needed
client3 = session.client("s3", caching=False)
assert client3 is not client1

Module-level functions would be served by the existing setup_default_session() function

boto3.setup_default_session(caching=True)

client1 = boto3.client("s3")
client2 = boto3.client("s3")
assert client2 is client1

benkehoe avatar Mar 24 '22 15:03 benkehoe

@tim-finnigan Are there any problem that could arise with caching the client? For instance, if I run in the context of kubernetes (EKS) with IAM Role for Service Account where the token rotates every hour, does it mean that a made to the cached client right after the token rotated will fail with a nice 403?

If I remember correctly, the Session holds the configuration for the credentials so if that is cached and the token is cached, this will not work after rotation.

Correct me if I'm wrong.

mbelang avatar Jul 07 '22 20:07 mbelang

Client caching doesn't affect credential refreshing. The credential provider that handles web identity tokens (used for EKS service roles) automatically deals with expiration and refreshing. You don't need to get a new client for those credentials to get refreshed.

benkehoe avatar Jul 07 '22 21:07 benkehoe

@benkehoe great. Is this the same concept with the session e.g. can we cache the session?

mbelang avatar Jul 11 '22 11:07 mbelang

@mbelang The session itself represents configuration and credentials. It doesn't need caching internal to boto3, but it's intended to be passed around in your code wherever clients/resources are needed (and are intended to use the same config/credentials), i.e., "cached" in your code. The refreshable credentials for web identity, for example, are refreshed by the session for any client created on the session. I wrote an explainer on why to use sessions. For example, when you create a library that makes AWS API calls, it should take an optional session as input (creating one itself using boto3.Session() if none is provided), and then get the appropriate client from the session. This pattern can lead to clients being created on the session multiple times in different places, which is why client caching would be beneficial.

benkehoe avatar Jul 11 '22 14:07 benkehoe

Yeah read your article couple weeks ago and I didn't realize I was talking to you. What you explain is pretty clean and I do understand what I need to do now :)

mbelang avatar Jul 11 '22 15:07 mbelang

hi, any update on this request ?

this would help speed up our use case as well.

ktruong248 avatar May 07 '23 04:05 ktruong248

Could this have an adverse effect on pytest/moto, e.g. some tests are mocked, some not - which results in all tests sharing the same session/client?

estahn avatar Aug 07 '23 06:08 estahn

Could this have an adverse effect on pytest/moto, e.g. some tests are mocked, some not - which results in all tests sharing the same session/client?

That's why it needs to be opt-in, rather than enabled by default. Separately, I would argue every test should create its own session (and to start with, tests should use sessions, rather than the module-level functions which all share the same default session).

benkehoe avatar Aug 07 '23 14:08 benkehoe