metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Expiring AWS credentials

Open tuulos opened this issue 4 years ago • 4 comments

In some environments like AWS Cloud9 or aws-vault, there's a background process that updates AWS credentials at ~/.aws/credentials (or env vars) periodically. Currently, Metaflow/boto don't refresh these credentials, leading to errors like

botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.

Apparently this is a known issue in boto too https://github.com/boto/botocore/issues/704

tuulos avatar Apr 24 '21 01:04 tuulos

Not sure if it's the best fit here, but I've used RefreshableCredentials from botocore in the past to deal with similar issues. It's poorly documented, but you can pass it a custom function via refresh_using to update the creds as needed. Happy to share some code snippets if useful.

https://github.com/boto/botocore/blob/develop/botocore/credentials.py#L366

russellbrooks avatar Apr 27 '21 21:04 russellbrooks

this is particularly annoying when a long-running model training step fails wasting hours of work

Screen Shot 2021-12-08 at 9 12 21 AM

tuulos avatar Dec 08 '21 17:12 tuulos

@tuulos : could we, as a workaround, catch this type of error and force a refresh (ie: retry like we do for some other errors)? This might be a good way to do it simply. The logic should be there mostly and it should be a matter of catching that specific error or is it that even retrying won't work (ie: there needs to be a deeper refresh).

romain-intel avatar Dec 10 '21 17:12 romain-intel

Note that there is already mechanism in botocore to refresh credentials from env vars automatically. Really the problem is specific to Cloud9 which chose a weird way to provision temporary creds.

oavdeev avatar Dec 11 '21 03:12 oavdeev