Add OPFS support
This PR adds OPFS (Origin Private File System) support for duckdb-wasm. With OPFS support, we could persist the database in the browser 🔥
API
To remain simple and intuitive, no new api is added, just pass a path starting with opfs:// to db.open and we'll know we're using OPFS and finish all the preparation work needed:
await db.open({
path: 'opfs://test.db',
accessMode: duckdb.DuckDBAccessMode.READ_WRITE,
});
const conn = await db.connect();
List of Changes
- Add OPFS Support
- Improve Debuggability: use
-goption to enable all DWARF debugging capabilities (before this, we could breakpoint but debug symbol information are missing) - Fix the coi worker build: the
duckdb-browser-coi.pthread.worker.tsoverride the global onmessage handler but forgot to setstartWorkerin theloadcmd (when missing, the coi worker will not start properly) - Update examples/esbuild-browser: add opfs example & support serving coi build
- Add tests
Thanks a lot for this PR! I will have to go properly through it and do a review, but looks a very cool addition
This PR seems to me addresses 2 separate features, OPFS and support for threads in the example.
I think it's clearer to have them land separately, I did a partial cherry-pick of this PR in #1501.
OPFS part I would need to more closely review and experiment with, but I will get to it in the next days.
Thanks a lot! I just did a rebase to resolve the conflict in examples/esbuild-browser/package.json after the partial cherry-pick
@carlopi Hi, checking in on the progress here, any updates? or any assistance i can offer?
Howdy @dengkunli! We are pretty excited about this capability - thank you! One quick question - does this also enable you to query a file that is located in OPFS? Could this enable a user to drop in a .csv file for example, and then query that file with DuckDB Wasm in a way that persists across browser sessions? If so, mind adding a test to show that sweet feature working?
Hi @dengkunli, sorry for the delay, work was undergoing in other areas of the code, and this has not gotten yet the required attention. I will send a proper review in the next few days, but this is very interesting to have in duckdb-wasm.
@Alex-Monahan of course! I've pushed a commit adding tests including:
- export csv to OPFS
- import csv from OPFS
- import parquet from OPFS using
read_parquetfunction (works as well without read_parquet function)
@carlopi Thanks a lot for taking the time to review !
does this PR merged in release version? any update?
Is there any update on getting this merged/released? We are currently bringing in our own version of the package which is not ideal.
I understand you are open source but if the PR is ready then can we get an ETA on merging it please?
Is there any update on getting this merged/released? We are currently bringing in our own version of the package which is not ideal.
I understand you are open source but if the PR is ready then can we get an ETA on merging it please?
Hi, I am also interested to know the ETA. Thanks
I am experimenting with this PR, and tried to ATTACH a database stored within the OPFS (note, this database doesn't exist yet - I want to create it as a part of the attach operation). It couldn't find a registered file handle. Any ideas on what needs to be added to support attaches?
Ex: ATTACH 'opfs://my_db_to_attach.duckdb' AS woot
I am experimenting with this PR, and tried to
ATTACHa database stored within the OPFS (note, this database doesn't exist yet - I want to create it as a part of the attach operation). It couldn't find a registered file handle. Any ideas on what needs to be added to support attaches? Ex:ATTACH 'opfs://my_db_to_attach.duckdb' AS woot
There have been very recent rework of the file system in mainline duckdb, see https://github.com/duckdb/duckdb/pull/11297 or https://github.com/duckdb/duckdb/pull/11343. Part of the motivation for those changes is more powerful support for attaching remote databases in duckdb. Those changes will be releases as duckdb v0.10.2.
Those changes of the interface are not compatible with current duckdb-wasm, that will need to be slightly changed to support those new APIs. I do not know, but it's possible that with the newer interface more stuff will work out of the box, or possible some different methods needs to be supported.
First and foremost: awesome work/library, thank you so much!! On the topic of this PR: would love to use this feature asap, even if it is released as experimental and/or there are upcoming changes that may (or may not) streamline this use-case. A bird in the hand is worth two in the bush? It seems like there's high demand for this feature based on:
- https://github.com/duckdb/duckdb-wasm/discussions/1689 (created 7hrs ago)
- https://github.com/duckdb/duckdb-wasm/discussions/1444
- https://github.com/duckdb/duckdb-wasm/discussions/1433
- https://github.com/duckdb/duckdb-wasm/discussions/1441
- https://github.com/duckdb/duckdb-wasm/discussions/1387
- https://github.com/duckdb/duckdb-wasm/discussions/1230 (1 year ago)
🙏
Hi @Alex-Monahan . @ravwojdyla and @carlopi
Actually we have maintained a DuckDB WASM with OPFS support since last year October. It seems like that it supports the future changes @carlopi mentioned. And we used it for our app few months. You can build and use it if you want, there are scripts in these two repos for quickly building and you can build it by yourself to exclude our custom extension. (https://github.com/datadocs/duckdb-wasm/compare/7c494e743bca3903b0af7c2892d2dc8538a0dc4e...5bf5ab43503bfea6a1be17ecbd020c2ec0a8179a)
And our forked version support opening db, attaching db, reading/writing files from OPFS. and it is similar to this PR:
import { createOPFSFileHandle } from '@datadocs/duckdb-wasm';
...
const dirHandle = await navigator.storage.getDirectory();
const dbHandle = await createOPFSFileHandle(`attach.db`, dirHandle, {
create: true,
emptyAsAbsent: true,
});
const walHandle = await createOPFSFileHandle(`attach.db.wal`, dirHandle, {
create: true,
emptyAsAbsent: true,
});
...
await duckdbConnection.query('ATTACH 'attach.db');
The repositories at here:
- https://github.com/datadocs/duckdb-wasm
- https://github.com/datadocs/duckdb
- https://github.com/datadocs/duckdb-wasm-dist
I originally intended to create these support as a PR into DuckDB repository. However, after several months app testing, I found that this PR already exists.
Those changes of the interface are not compatible with current duckdb-wasm, that will need to be slightly changed to support those new APIs. I do not know, but it's possible that with the newer interface more stuff will work out of the box, or possible some different methods needs to be supported.
@carlopi Got it, thanks for the info, I'll provide an update accordingly
Hi @dengkunli! Do you need any help on getting this over the finish line?
Hi @dengkunli! Do you need any help on getting this over the finish line?
yes actually, following recent rework of the file system in mainline duckdb, i've work out the mvp & eh build, but not the coi build (cross-origin isolation build that uses pthreads). In coi mode, an error is frequently thrown (not always, but very likely) in pthread worker: the error occurs when js invokes duckdb_web_fs_get_file_info_by_id and parse the result string as json, when parsing, instead of receiving a valid json, it receives a malformed json whose first character is not { and instead is \u0000, example: \u0000"cacheEpoch":3,"fileId":1,"fileName":"opfs://test.db","fileSize":12288.0,"dataProtocol":3.0,"dataUrl":"opfs://test.db","collectStatistics":false}
i've spent some time digging in, but could not find the cause, any idea what kind of error it may be?
Thanks a lot @hamilton @carlopi
@carlopi have you approved this merge or should we work from the branch?
Based on this article, it may be best to have a separate OPFS DB per web worker and not share it scross browser tabs: https://www.notion.so/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite
Based on this article, it may be best to have a separate OPFS DB per web worker and not share it scross browser tabs: https://www.notion.so/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite
That would be a huge mistake IMO, for the same reason they didn't do that in the article. Multiple tabs should be able to share the same backing dataset, without duckdb-wasm enforcing a per tab restriction. That being said, while it might be nice for this package to handle concurrency internally, I think it would be preferable to ship something first, with the caveat being that users have to manage concurrency / race conditions themselves.
any updates on this PR? What's required to land this?
@dengkunli @derekperkins is there any reason that this isn't yet merged? Happy to pitch in if there's any additional work needed to land this in the next couple weeks.
gentle bump @carlopi
If @hangxingliu has a working implementation, and this still requires significant reworking with the v1 filesystem changes, should we look at just upstreaming that?
If @hangxingliu has a working implementation, and this still requires significant reworking with the v1 filesystem changes, should we look at just upstreaming that?
Sorry @derekperkins , I just read all changes between our forked version and the latest code in this repo, and confirmed that our implementation is also outdated with the latest DuckDB's version. But the good news is that the migration is not very difficult and we will update our code recently because our app also needs to use the latest version of DuckDB as one of frontend storage solution. But I am not sure does DuckDB official team has any plan to implement this feature. We would use official solution if they have.
@hangxingliu thanks for the update. If this PR isn't updated by the time you update your fork, you should submit a separate PR. Maybe we can take the best ideas from both.
@derekperkins upstreaming the changes made by @hangxingliu would be ideal. I'm very interested in having some kind of built-in persistence story, and I think there have been a few others threads on GH about this - would make duckdb-wasm much more powerful!