boto3
boto3 copied to clipboard
Read timeout on lambda invocation after credentials renewal
Describe the bug
I'm invoking a lambda function using boto3 and authenticating with the AssumeRoleWithWebIdentity method.
When invoking the lambda while the credentials are still valid, there's no issue.
If I invoke the lambda while they are expired, boto3 is first taking care of rotating the credentials (and that part seems to work as seen in the logs - see attached file), after the new credentials have been retrieved boto3 finally tries to invoke the lambda and here I encounter a read timeout:
[...] retry needed, retryable exception caught: Read timeout on endpoint URL: \"https://lambda.XXXX.amazonaws.comXXXXXX\"" [...]
After the timeout has expired, boto3 automatically retry to invoke the lambda and this time it works.
Expected Behavior
The lambda invocation when new credentials need to be retrieve should work on first try.
Current Behavior
Here are the full logs of boto3, I've replaced any token/id with XXX:
logs.txt
Reproduction Steps
- Authenticate to AWS with
AssumeRoleWithWebIdentitymethod - Invoke a
lambda - Wait 60min for the credentials to expire
- Invoke a
lambda-> first attempt fails on read timeout
Possible Solution
I've noticed from looking at the logs that the failed lambda invocation does not start its own http connection, instead it seems to reuse this one:
timestamp=2022-05-23T12:27:07.970380+00:00 level=DEBUG message="Starting new HTTPS connection (1): sts.amazonaws.com:443"
This one is targeting sts.amazonaws and it timeout.
Then before the second attempt, a new http connection is started:
timestamp=2022-05-23T12:28:08.572227+00:00 level=DEBUG message="Starting new HTTPS connection (2): lambda.us-west-2.amazonaws.com:443"
This one is targeting lambda.us-west-2.amazonaws.com and it’s a success.
Could it come from the fact that the two connections are not made against the same endpoint? And the lambda invocation is trying to use sts instead of lambda?
Additional Information/Context
My lambda invocation happens in a Django app serving a graphQL API. The service is running in a Kubernetes cluster on GCP cloud.
SDK version used
1.21.1
Environment details (OS name and version, etc.)
python:3.10-slim-bullseye
Hi @superlevure thanks for reaching out. I don’t think the Lambda timeout was related to your credentials in this case. There are various things that could cause a Lambda function to time out. Here is a premium support article that offers troubleshooting guidance for this: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-retry-timeout-sdk/. Please let me know if that helps.
Hi @tim-finnigan,
Thank you for your answer, I had look at the page you sent but I don't think it's a "real" lambda timeout here. The facts that it is 100% reproducible and it always happens after credentials renewal make me believe it has indeed something to do with sts.
Have you taken a look at the Possible solution part of my message? I'm curious to know your opinion on the point I'm raising there
Hi @superlevure thanks for following up. In the credentials documentation it mentions how boto3 will handle credential refreshing in cases like this. (There's also a feature request related to proactively refreshing these credentials: https://github.com/boto/boto3/issues/2345). But invoking the Lambda service would involve a request to a Lambda endpoint.
Hi @tim-finnigan,
Thank you for the link, I can assure you that I've been through all the documentation I could find on the topic before posting here and the page you sent was part of that, unfortunately I'm still blocked.
The issue you mentioned is interesting but it's a different problem. The issue there is that the call to sts to renew the creds is considered as too slow, in my case I don't really care about that metric, the call to sts is fine for me (see the logs attached to my first message).
The issue for me is that the sub-sequential call to invoke my lambda always times-out. This call is automatically handle by boto3 after creds renewal and I have no control over it, and I don't understand why it's made against a sts endpoint (see the Possible Solution part of my message). Side note: I can't see any logs in cloudwatch regarding this call, I suspect it never makes it to my lambda at all.
When I invoke my lambda and it does not induce any cred renewal, everything work as intended, 100% of the time.
I don't understand in your message:
But invoking the Lambda service would involve a request to a Lambda endpoint.
Can you be more specific? I am making a request to my lambda endpoint. The one that is failing is made by boto3 itself I can't change that.
Thanks for your help!
Hi @superlevure thanks for following up. Can you not see the Cloudwatch logs due to IAM permissions or another issue? Assuming you are calling the lambda invoke command, that involves making a call to the lambda endpoint of your configured region (for example lambda.us-west-2.amazonaws.com.) But you can configure the session timeout when using AssumeRoleWithWebIdentity as described here in the documentation. I’m not sure if that addresses your issue but please let me know if there are any points you want to clarify.
Since we haven't heard back in a few weeks I'm going to close this issue. Please let us know if you are still seeing this issue and if so we can revisit it.