django-storages icon indicating copy to clipboard operation
django-storages copied to clipboard

S3 backend: AWS machine roles (and other refreshable temporary credentials) do not get refreshed

Open kimvais opened this issue 9 months ago • 1 comments

When running on EKS with IRSA the S3 client gets instantiated at the app start time and the current AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN returned by the STS Assume Role call handled by the boto3 will get "hardcoded" to the process memory with no automation nor API to refresh them.

This causes storages to start to fail completely once the AWS_SESSION_TOKEN expires. Furthermore, also the AWS_WEB_IDENTITY_TOKEN will expire on a long running pod, but doing the assume role call with the new token (automatically refreshed by EKS) is probably handled by boto3.

kimvais avatar Feb 28 '25 12:02 kimvais

Similar setup and problem, before the first use of storages we have to assume another role in a different account. This works fine till the token expires. @kimvais did you find any solution already?


I got an idea, but did not test it. Currently we do overwrite the _create_session() method to assume another role. Works fine.

class S3StorageBackend(S3Storage):
    def _create_session(self) -> Session:
        session = super()._create_session()
        sts_client = session.client("sts")
        
        assumed_role = sts_client.assume_role(
            RoleArn=settings.AWS_S3_ROLE_ASSUMPTION,
            RoleSessionName=settings.AWS_S3_ROLE_SESSION_NAME,
        )

        return boto3.Session(
            aws_access_key_id=assumed_role["Credentials"]["AccessKeyId"],
            aws_secret_access_key=assumed_role["Credentials"]["SecretAccessKey"],
            aws_session_token=assumed_role["Credentials"]["SessionToken"],
        )

I have to say that I am neither experienced with boto3 nor the whole django-storage part, but we may overwrite the connection property to something like this:

@property
def connection(self):
    connection = getattr(self._connections, "connection", None)

+    if connection is not None:
+        try:
+            connection.meta.client.head_bucket(Bucket=self.bucket_name)
+        except (NoCredentialsError, ClientError, EndpointConnectionError): # probably too broad error capturing
+            delattr(self._connections, "connection")
+            connection = None

    if connection is None:
        session = self._create_session()
        self._connections.connection = session.resource(
            "s3",
            region_name=self.region_name,
            use_ssl=self.use_ssl,
            endpoint_url=self.endpoint_url,
            config=self.client_config,
            verify=self.verify,
        )
    return self._connections.connection

BUT I am not sure if this is a "clean" approach and I am also not sure if there is a conflict regarding threading stuff where I am also not confident in, see threading.local() in s3.py. Another idea I had in mind, we might get somewhere an token expiration timestamp (e.g. during role assumption or also provided by EKS?) which could be used to trigger a refresh somehow/somewhere.

JohnTrunix avatar Jun 06 '25 14:06 JohnTrunix

Can we confirm this issue is specific to IRSA or assuming a role not with all "machine roles"?

EC2 instance profiles, ECS / Fargate task roles seem to be refreshed by AWS itself ("push" mode update to the metadata api) vs an SSO / OIDC / assumed role where the credentials will expire and a refresh needs to be initiated from the client ("pull" mode).

There is an issue with boto3 / botocore itself in regards to role assumption not being automatically refreshed.

  • https://github.com/boto/boto3/issues/443
  • https://github.com/boto/botocore/issues/761

And seems like at least one person addresses the role assumption refresh with an add-on https://github.com/benkehoe/aws-assume-role-lib

More discussion https://www.reddit.com/r/aws/comments/1ek2022/autorenewing_iam_role_inside_a_container/

leetrout avatar Sep 14 '25 02:09 leetrout