Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Add an attribute to S3Config to refresh S3 credentials

Open raghumdani opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. Currently, S3 credentials specified in S3Config are static. It is possible that the data frame can actually perform reads long after the dataframe was created i.e., when collect is called. Hence, we need a way to dynamically refresh S3 credentials pass in S3Config.

Describe the solution you'd like

provider = lambda x: ...fetch credentials 

s3=S3Config(
            credentials_provider=provider,
            retry_mode="adaptive",
            num_tries=BOTO_MAX_RETRIES,
            max_connections=DAFT_MAX_S3_CONNECTIONS_PER_FILE,
)

df  = daft.read_parquet(io_config=s3)

# ... after credentials expire.

df.collect() # This call would refresh credentials if they have expired

Describe alternatives you've considered I don't see any other alternatives apart from us having to set long lived credentials which is a security risk.

Additional context Add any other context or screenshots about the feature request here.

raghumdani avatar Feb 20 '24 02:02 raghumdani

Hey @raghumdani ! Not sure if I closed the loop on this, but we implemented and merged this functionality in #2233, and you can use it by passing in a function into the credentials_provider parameter when creating a daft.io.S3Config object. Documentation is here!

kevinzwang avatar Jun 18 '24 01:06 kevinzwang