s3fs
s3fs copied to clipboard
Check for updated tokens
Currently s3fs/boto will scrape AWS environment variables in order to get credentials for access. Sometimes these credentials expire in the middle of a session. This is especially common with temporary credentials, which often have a one-hour lifetime.
It's easy to update these environment variables periodically (say every 20 minutes) but s3fs/boto don't pick these up when the old ones expire.
Thoughts on how best to address this?
cc @ntabris
I would assume, that in the Dask context, instances are being requested from fsspec in each task. These instances are cached, so you will always end up with the original one, with its boto session (which is good, saving on startup overhead). However, if the token has changed, it's probably enough to flush the instance cache:
s3fs.S3FileSystem.clear_instance_cache()
which will cause a new session to be made when next an instance is required. This would not, however, solve the case of very long-lived tasks that hold on to the reference of a single filesystem instance.
Thanks, @martindurant!
That appears to work in some but sadly not all cases (as you said).
In particular, it works when I run df.to_parquet
, wait for token to expire and be refreshed (including a call to s3fs.S3FileSystem.clear_instance_cache()
), then run another df.to_parquet
. Yay!
But if there's a very large df and the to_parquet
call spans the token expiration/refresh, then the tasks are still failing (with ExpiredToken
errors). I chatted with @ian-r-rose and his guess is that the tasks are getting deserialized early and this includes the old boto session.
My guess is that this would be best addressed by some changes in _call_s3
(https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L324-L334) so there's a retry with new boto session if we get ExpiredToken
error.
Does that seem right? Any suggestions of other/better ways to address this?
Perhaps something like https://stackoverflow.com/a/69226170/3821154 ?
I think I'm not following the suggestion. Are you thinking that's a workaround that doesn't require changing s3fs
? Or that s3fs
could do something like that?
(My feeling is that this approach doesn't work as well if we're setting STS token credentials as env vars, but maybe I'm just missing how this would work.)
s3fs has a similar session object, and the linked code suggests that you can simply make a new credentials object and set the attribute. Yes, I would imagine this happening in the except block within _call_s3.
Note also, that expired credentials are not necessarily refreshable, such as my corporate account which requires 2FA.
s3fs has a similar session object, and the linked code suggests that you can simply make a new credentials object and set the attribute. Yes, I would imagine this happening in the except block within _call_s3.
To make sure I understand, you're suggesting that this should be fixed by adding a try-except
block within _call_s3
in the s3fs
code linked to above? Is that correct?
s3fs has a similar session object, and the linked code suggests that you can simply make a new credentials object and set the attribute. Yes, I would imagine this happening in the except block within _call_s3.
To make sure I understand, you're suggesting that this should be fixed by adding a
try-except
block within_call_s3
in thes3fs
code linked to above? Is that correct?
try-catching
and then refresh the standard session.
or use AioRefreshableCredentials
instead or https://github.com/aweber/aiobotocore-refreshable-credentials - that should expire/rotate credentials based on TTL on their own
and i am positive this will also cover 2FA credentials that @martindurant has: if your credentials expire - they, well, expire :) so the operation should throw an exception when they are
@spawn-guy , sounds like you have a plan - would you like to make a PR?