duckdb-wasm icon indicating copy to clipboard operation
duckdb-wasm copied to clipboard

unable to use httpfs extension in node

Open declann opened this issue 11 months ago • 1 comments

What happens?

I got the bare-node example working as described here

But with duckdb-wasm 1.28, 1.29.0, and 1.29 dev I can't get the httpfs extension working in node.

Ideally a query like this:

await conn.query(`select * from 'https://blobs.duckdb.org/data/taxi_2019_04.parquet' limit 1;`);

Would just work, as it does in the browser.

But working from bare-node it gives an error Error: IO Error: No files found that match the pattern "select * from 'https://blobs.duckdb.org/data/taxi_2019_04.parquet' limit 1;"

I tried adding INSTALL httpfs and LOAD httpfs and also LOAD '[path to local url with a download of https://extensions.duckdb.org/v1.2.1/wasm_eh/httpfs.duckdb_extension.wasm]' and could not get anything to work. Also tried multiple node versions.

To Reproduce

npm init -y
npm i --save @duckdb/duckdb-wasm web-worker

copy to index.cjs: (same as bare-node example in repo with only one new line, marked with NEW LINE. Note: here this only works because of npm i commands above (web-worker missing in example package.json))

const duckdb = require('@duckdb/duckdb-wasm');
const path = require('path');
const Worker = require('web-worker');
const DUCKDB_DIST = path.dirname(require.resolve('@duckdb/duckdb-wasm'));

(async () => {
    try {
        const DUCKDB_CONFIG = await duckdb.selectBundle({
            mvp: {
                mainModule: path.resolve(DUCKDB_DIST, './duckdb-mvp.wasm'),
                mainWorker: path.resolve(DUCKDB_DIST, './duckdb-node-mvp.worker.cjs'),
            },
            eh: {
                mainModule: path.resolve(DUCKDB_DIST, './duckdb-eh.wasm'),
                mainWorker: path.resolve(DUCKDB_DIST, './duckdb-node-eh.worker.cjs'),
            },
        });

        const logger = new duckdb.ConsoleLogger();
        const worker = new Worker(DUCKDB_CONFIG.mainWorker);
        const db = new duckdb.AsyncDuckDB(logger, worker);
        await db.instantiate(DUCKDB_CONFIG.mainModule, DUCKDB_CONFIG.pthreadWorker);

        const conn = await db.connect();
        await conn.query(`SELECT count(*)::INTEGER as v FROM generate_series(0, 100) t(v)`);
        await conn.query(`select * from 'https://blobs.duckdb.org/data/taxi_2019_04.parquet' limit 1;`); // NEW LINE

        await conn.close();
        await db.terminate();
        await worker.terminate();
    } catch (e) {
        console.error(e);
    }
})();

Browser/Environment:

node v22.12.0

Device:

ubuntu

DuckDB-Wasm Version:

1.29.0 but also others

DuckDB-Wasm Deployment:

from @duckdb/duckdb-wasm on npm

Full Name:

Declan Naughton

Affiliation:

DCN Consulting

declann avatar Apr 14 '25 13:04 declann

I did notice this in docs:

The HTTPFS extension is, at the moment, not available in DuckDB-Wasm. Https protocol capabilities needs to go through an additional layer, the browser, which adds both differences and some restrictions to what is doable from native.

Instead, DuckDB-Wasm has a separate implementation that for most purposes is interchangeable, but does not support all use cases (as it must follow security rules imposed by the browser, such as CORS). Due to this CORS restriction, any requests for data made using the HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB-Wasm instance to access that data. The MDN website is a great resource for more information regarding CORS.

But, my reading is that CORS-allowing my query should be working in native just as in browser, maybe I'm wrong here?

declann avatar Apr 14 '25 14:04 declann