Add an attribute to S3Config to refresh S3 credentials
Is your feature request related to a problem? Please describe.
Currently, S3 credentials specified in S3Config are static. It is possible that the data frame can actually perform reads long after the dataframe was created i.e., when collect is called. Hence, we need a way to dynamically refresh S3 credentials pass in S3Config.
Describe the solution you'd like
provider = lambda x: ...fetch credentials
s3=S3Config(
credentials_provider=provider,
retry_mode="adaptive",
num_tries=BOTO_MAX_RETRIES,
max_connections=DAFT_MAX_S3_CONNECTIONS_PER_FILE,
)
df = daft.read_parquet(io_config=s3)
# ... after credentials expire.
df.collect() # This call would refresh credentials if they have expired
Describe alternatives you've considered I don't see any other alternatives apart from us having to set long lived credentials which is a security risk.
Additional context Add any other context or screenshots about the feature request here.
Hey @raghumdani ! Not sure if I closed the loop on this, but we implemented and merged this functionality in #2233, and you can use it by passing in a function into the credentials_provider parameter when creating a daft.io.S3Config object. Documentation is here!