Expiring AWS credentials
In some environments like AWS Cloud9 or aws-vault, there's a background process that updates AWS credentials at ~/.aws/credentials (or env vars) periodically. Currently, Metaflow/boto don't refresh these credentials, leading to errors like
botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.
Apparently this is a known issue in boto too https://github.com/boto/botocore/issues/704
Not sure if it's the best fit here, but I've used RefreshableCredentials from botocore in the past to deal with similar issues. It's poorly documented, but you can pass it a custom function via refresh_using to update the creds as needed. Happy to share some code snippets if useful.
https://github.com/boto/botocore/blob/develop/botocore/credentials.py#L366
this is particularly annoying when a long-running model training step fails wasting hours of work

@tuulos : could we, as a workaround, catch this type of error and force a refresh (ie: retry like we do for some other errors)? This might be a good way to do it simply. The logic should be there mostly and it should be a matter of catching that specific error or is it that even retrying won't work (ie: there needs to be a deeper refresh).
Note that there is already mechanism in botocore to refresh credentials from env vars automatically. Really the problem is specific to Cloud9 which chose a weird way to provision temporary creds.