feat(duckdb): set secrets via SecretsManager
Discussed in https://github.com/ibis-project/ibis/discussions/9633
Originally posted by andersbogsnes July 18, 2024 Now that DuckDB has implemented a proper credentials manager, is there some way of supporting that through Ibis natively?
https://duckdb.org/docs/configuration/secrets_manager
Today you'd either have to go via FSSpec which is going to be slower than native Duckdb, or make sure to run a .raw_sql('CREATE SECRET ...') beforehand.
Maybe some kind of set_credentials option for the backend?
Wonder if we could also or instead of set_credentials add secrets kwargs to connect?
This is something that came up for me recently! I used the .raw_sql('CREATE SECRET ...') approach with the DuckDB backend to connect to an S3-compatible storage provider.
Other backends, like Snowflake and Microsoft SQL Server, support creating SECRET/CREDENTIAL, but they are used slightly differently than the DuckDB implementation.
- https://docs.snowflake.com/en/sql-reference/sql/create-secret
- https://learn.microsoft.com/en-us/sql/t-sql/statements/create-credential-transact-sql
I share this in hopes that it may help determine a method name or if this should be supported via kwargs on connect instead. 💭
is .raw_sql still our only option for any S3 secrets?
@cboettig -- no, DuckDB currently uses the AWS SDK for authentication, but it can be a bit spotty in what is supported.
The most reliable way is to set access key, secret key and (critically) region/default region via environment variables.
@gforsyth thanks! sorry I don't entirely follow.
e.g. this works (bucket is public, should be reproducible):
import ibis
con = ibis.duckdb.connect()
con.raw_sql(f'''
CREATE OR REPLACE SECRET secret (
TYPE S3,
ENDPOINT 'sdsc.osn.xsede.org',
URL_STYLE 'path'
);
''')
path = "s3://bio230014-bucket01/challenges/forecasts/bundled-parquet/project_id=neon4cast"
con.read_parquet(path + "/**")
but is quite verbose and not particularly pythonic. I wasn't clear how to do this with env vars. What's the best way to do this in ibis at the moment?
Hey @cboettig !
I thought you were thinking of secrets for accessing private buckets, in which case, usually setting these environment variables does the trick: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-envvars.html
However, while there is AWS_ENDPOINT_URL for specifying custom endpoints, I don't know if there's a way to tell the AWS SDK to use a specific URL_STYLE with an environment variable, so you might just be stuck with the non-pythonic method for now.
@gforsyth Thanks for the reply!
yeah, looks like we're stuck. The AWS SDK supports addressing_style in config files (https://docs.aws.amazon.com/cli/latest/topic/s3-config.html) but I can't get duckdb to read config directly (??). I know gdal supports AWS_VIRTUAL_HOSTING arg for this. arrow is the best with this, automatically turning virtual hosting on only if an endpoint override is unset (https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html).