dbt-duckdb icon indicating copy to clipboard operation
dbt-duckdb copied to clipboard

Unable to connect to duckdb database residing on S3

Open fRoDdYy opened this issue 2 years ago • 16 comments

Referring to the documentation when I am trying to connect to my duckdb database by attaching it

outputs: dev: type: duckdb path: tmp/dbt.duckdb extensions: - httpfs - parquet settings: s3_region: ap-south-1 s3_access_key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}" s3_session_token: "{{ env_var('AWS_SESSION_TOKEN') }}" attach: - path: "s3://bucket-name/file-path/database.duckdb read_only: true threads: 1 external_root: s3://bucket-name/output target: dev

I get the following error:

Runtime Error Catalog Error: Cannot open database "s3://bucket-name/file-path/database.duckdb" in read-only mode: database does not exist

fRoDdYy avatar Sep 13 '23 13:09 fRoDdYy

Huh; the path in the attach statement doesn't match the path in the error and it's not totally obvious to me why-- will take a look.

jwills avatar Sep 13 '23 14:09 jwills

The s3 attach via httpfs story is a bit of a strange one right now - I think you can only set auth via environment variables or query parameters

Mause avatar Sep 13 '23 14:09 Mause

ack, thank you @Mause! I swear I've made it work before, let me see if I can sort out how.

jwills avatar Sep 13 '23 14:09 jwills

@Mause I tried it with environment variable too but the issue persists.

@jwills Yeah sorry I made that mistake while typing have corrected it though.

fRoDdYy avatar Sep 13 '23 16:09 fRoDdYy

Maybe double check the environment variable names? https://duckdb.org/docs/extensions/httpfs#configuration-1

Mause avatar Sep 13 '23 16:09 Mause

I am also interested in this use case. I'm trying to perform a query on a DuckDB database in S3. I'm pretty sure my env vars are correct because I'm able to run a query like:

with t as (
	SELECT *
	FROM iceberg_scan('s3a://path/to/files', allow_moved_paths = true)
)
SELECT *
from t;

without issue.

I've set my env vars like this:

SET s3_region = 'us-east-1';
SET s3_access_key_id = 'access_key_id';
SET s3_secret_access_key = 'secret_access_key';

and I've also executed the following:

INSTALL httpfs;
INSTALL iceberg;

LOAD httpfs;
LOAD iceberg;

(Iceberg not related directly to this problem, just noting it).

I also get the error:

Catalog Error: Cannot open database "s3://path/to/test.duckdb" in read-only mode: database does not exist

I know my repro isn't using dbt-duckdb, but I'm also interested in this use case with dbt-duckdb (was just testing with SQL to verify behavior outside of dbt-duckdb).

NatElkins avatar Jan 31 '24 15:01 NatElkins

Ah appreciate that @NatElkins -- I wonder if it works if you use the fsspec stuff? So as to treat S3 like a filesystem as opposed to trying to use the httpfs route?

jwills avatar Jan 31 '24 17:01 jwills

@jwills It doesn't seem to work, although I may be doing something wrong. I have the creds defined in my ~/.aws/credentials file.

import duckdb
from fsspec import filesystem

duckdb.register_filesystem(filesystem('s3', anon=False))
duckdb.connect("s3://bucket/path/to/test.duckdb'")
r1 = duckdb.sql("select * from test")
print(r1)

The error I get is:

duckdb.connect("s3://bucket/path/to/test.duckdb")
duckdb.duckdb.IOException: IO Error: Cannot open file "/Users/nathanielelkins/Projects/dbt_test/s3://bucket/path/to/test.duckdb": No such file or directory

NatElkins avatar Feb 01 '24 21:02 NatElkins

Yep, right there with you-- I cannot figure out how to get this to work right now. 😞

jwills avatar Feb 01 '24 21:02 jwills

I stumbled across this thread when searching for the same issue and I managed to make this working using the Secret provider system (see the S3 API Support Page).

The issue seems to be to set credentials using statements like SET s3_endpoint = ....

On the 0.10.3 version, when creating a secret like :

INSTALL httpfs; 
LOAD httpfs;
CREATE SECRET secret1 (
        TYPE S3,
        ENDPOINT '***',
        KEY_ID '****',
        SECRET '****',
        REGION '****'
);

then performing the "attach" statement :

ATTACH 's3://path/to/dbfile'
AS db (READ_ONLY)

it worked perfectly

AldricVS avatar Jun 03 '24 15:06 AldricVS